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ABSTRACT 

This Final Report describes the activities, results, and 
conclusions of a three-year project whose major purpose was to 
improve "test-taking" skills of learning disabled (LD) and 
behaviorally disordered (BD) children, with respect to 
standardized achievement tests. Project activities are generally 
subdivided into three areas. Tha first area of investigation was 
concerned with evaluating potential cognitive and affective 
deficits with respect to test taking. Based upon these findings, 
the second area of project activities included experimental 
efforts to train test-t?king skills to LD and BD students. In a 
third area of inquiry, a series of investigation, examined the 
role and utility of achievement tests in special education. 
Overall, it was concluded that (a) LD and BD students do exhibit 
deficits on cognitive and affective aspects of test-taking, (b) 
these deficits can be partially remediated through training, to 
the extent that scores reliably improve, and (c) standardized 
achievement testing provides an important function in special 
education, but additional measures are equally important. Project 
products and publications are included in the Appendices. 
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PROJECT OVERVIEW 

The primary objective of this project (Taylor & Scruggs, 
1983) was to determine whether scores on standardized achievement 
tests could be improved through a combination of reinforcement, 
practice, and training of "test-taking skills;" that is, those 
skills which refer to understanding of the most efficient means to 
take a lest rather than knowledge of the content area (see 
"Research in Progress," Appendix A). Such training, if 
successful, would likely improve the validity of resulting test 
scores in that a potential source of error, i.e., difficulty with 
format, testing conditions, etc., would be eliminated. In 
addition to the primary objective, two related areas of inquiry 
were investigated. First, several studies were undertaken to 
determine any possible cognitive or affective deficits 
contributing to lowered "test-taking skills," and subsequently, 
test scores. In addition, since deficits were uncovered, a 
smaller number of investigations was undertaken to examine the 
value of group-administered, standardized achievement tests. 
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Cognitive and Affective Correlates of 
Test-Taking Performance 
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When this project was initially conceived, it was assumed 
that materials development would not be necessary, as materials 
had been developed from a prior project and were at that time 
being validated. Since this project was funded, however, it has 
been determined that those materials, as implemented, were not 
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effective in increasing the performance of students in regular 
education classes on standardized achievement tests. It was, 
therefore, thought necessary to initiate a series of studies to 
evaluate what specific skills lower-functioning students may lack 
with respect to test taking, and to develop a new set of materials 
which might more specifically address these needs. 
Accomplishments are described below by each task. 

Test-Taki ng Skills Deficits of Learning 
Disabled and Behavi oral Iv Disordered Studt^nts 

Initial investigations. A shorter version of the Stanford 
Achievement Test (reading subtests), questionnaire form, and 
follow-along sheet, were developed in order to evaluate the skills 
students sp( ntaneously employed in test-taking situations 
(Scruggs, Bennion, & Lifson, 1985). These materials were utilized 
in several studies to acquire this information. Students were 
selected from two remedial and one enrichment program from each of 
Grades 1 through 7. Students were individually administered 
selected subtests of the Stanford Achievement Test. They were 
asked for their level of confidence for each answer and the 
strategies they had chosen for answering the questions. It was 
determined that a complete hierarchy of strategies existed with 
respect to answering test questions beyond simply "knowing" or 
"not knowing" the answer, and that these strategies resulted in 
differential levels of performance on the part of the students. 
This investigation is described in detail in the manuscript in 
Appendix B entitled, "An Analysis of Children's Strategy Use on 
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Reading Achievement Tests." This manuscript has been published in 
Elementary School Journal. Additional evaluation of the data from 
this investigation indicated the existence of a developmental 
trend through the elementary grades in the use of elimination 
strategies on ambiguous multiple choice items (Scruggs & Bennion, 
1983). That is, as children got older, they became more 
proficient with respect to their spontaneous ability to eliminate 
inappropriate or obvioL-iy incorrect alternatives. These results 
have also been described in detail in the manuscript entitled, 
"Developmental Aspects of Test-Wiseness for Absurd Options: 
Elementary School Children," which is given in Appendix C. 

A test of "passage independence" of reading comprehension 
test items on the Stanford Achievement Test was developed and 
validated by administering items from the Reading Comprehension 
subtest of the SAT to college undergraduates (Scruggs, Lifson, & 
Bennion, 1984). The purpose of this investigation was to 
determine what proportion of these test items were potentially 
answerable by employing prior knowledge or deductive reasoning 
skills. It was determined that college undergraduates were able 
to answer nearly 80% of these questions on the average, with many 
students answering them all correctly. This pilot investigation 
is given in Appendix D under the title, "Passage Independence in 
Reading Achievement Tests: A Fol low-Up," and has been published 
in the journal Perceptual and Motor Skills . 

Follow-up investiQation.s. Two follow-up investigations W3re 
intended to examine more precisely the nature of test-taking 



ERIC 10 



strategies employed by learning disabled students, specifically as 
compared with the strategies employed by their non-disabled 
counterparts. In one investigation, LD and non-LD students were 
administered items from the Stanford Achievement Test, Reading 
Comprehension subtest, with the actual reading passages deleted 
from the test (Scruggs, Bennion, & Lifson, 1984). Students were 
told to simply answer the questions the best that they could. In 
the second experiment, all items were read to both groups of 
students in order to control for general reading ability. In both 
experiments, students not classified as learning disabled scored 
significantly higher on this test of "passage independent" test 
items than did their learning disabled counterparts. These 
results indicated (a) that learning disabled students may differ 
with respect to spontaneous test-taking strategies, such as use of 
prior knowledge and deductive reasoning skills, and (b) raise the 
issue of what such test items are actually measuring, since they 
could be so easily answered without having read the corresponding 
passage. This investigation has been written in manuscript form 
and is in Appendix E under the title, "Are Learning Disabled 
Students Test-Wise: An Inquiry Into Reading Comprehension Test 
Items It has been published as an ERIC document and was 
presented at the annual meeting of the American Educational 
Research Association, Chicago, April, 1985* (see footnote, page 
_). 

In a second investigation, learning disabled and non-learning 
disabled students were directly questioned with respect to 



strategies thP/ employed on reading comprehension test items and 
letter sounds test items (Scruggs, Bennion, & Li f son, 1985). In 
this investigation, it was found that learning disabled students 
did not differ from their non-disabled peers with respect to 
answering recall comprehension questions, with ability to read 
controlled. However, learning disabled students were less likely 
to employ appropriate strategies to answer inferential questions 
and reported inappropriately high levels of confidence in their 
responses. In addition, when they did report using appropriate 
strategies, they were much less likely to employ them 
successfully. This project is described in detail in the 
manuscript, "Learning Disabled Students' Spontaneous Use of Test- 
Taking Skills on Reading Achievement Tests" (Appendix F) . This 
manuscript has been published in Learning Disability niiartpriy and 
was presented at the annual meeting of the American Educational 
Research Association in New Orleans in April, 1984. 

Separate answer shPPts. Since a major format change in 
standardized tests, which takes place in the upper elementary 
grades, is the use of separate answer sheets, a preliminary 
evaluation was made of the relative ability of learning disabled 
students to utilize separate answer sheets (Tolfa & Scruggs, 
1986). Results of th^"s investigation indicated that LD students 
differed with '-espect to speed of responding, but not accuracy of 
responding, with speed controlled. In addition, descriptive 
results suggested the LD students may be more likely to go outside 
the line of the answer circle. The manuscript which describes 
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this investigation Is entitled, "Can LD Students Effectively Use 
Separate Answer Sheets?" and is found in Appendix N. As a result 
of this investigation, it was determined to include a training 
component for the effective use of separate answer sheets. 

Attitudes Toward Tests 

In a separate investigation, some evidence was provided that 
a sample of e" dnient?ry-age behavi orally disordered students scored 
significantly lower than their nonhandi capped counterparts with 
respect to reported attitudes towards tests and the test-taking 
situation (Scruggs, Mastropieri, Tolfa, & Jenkins, 1985). This 
manuscript was published in the journal Perceptual and Mp^i yr 
SJdlU and is given in Appendix G. An additional investigation 
(Tolfa, Scruggs, & Mastropieri, 1985) has replicated this finding, 
also published in Perceptual ard Motor Skills , and is in Appendix 
H. These investigations, taken together, provided valuable 
in.crmation regarding the most optimal training package to be 
developed for use with mildly handicapped students. 

Format Changes 

An evaluation of all major achievement tests was also made in 
order to determine whether tests were similar or different with 
respect to format demands on the test taker (Tolfa, Scruggs, & 
Bennion, 1985). in this investigation, all levels of six major 
achievement tests were evaluated for number of format changes per 
minute throughout the reading achievenent test subtest. It was 
determined that achievement tests varied widely with respect to 
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format demands, with most format changes occurring in the primary 
grades. These results are documented in the manuscript, "Format 
Changes in Reading Achievement Tests: Implications for Learning 
Disabled Students," which can be found in Appendix I and has been 
published in Psychology in the SchnnU . 

Meta-Analvsis 

In order to evaluate appropriately all pr vious attempts to 
train test-taking skills in the elementary grades, a meta-analysis 
was completed of all available studies in this area (Scruggs, 
White, & Bennion, in press). It was determined that although the 
general effect of training was positive, differences in favor of 
training groups did not se^m to become substantial unless training 
was rel;^tively extensive. In addition, this meta-analysis 
levealed that low SES children and primary grade children were 
more likely to benefit from extended training hours. This seems 
to underline the importance in the present project of implementing 
a package with a higher level of intensity. Also, an evaluation 
of effect sizes provided information for power estimates for the 
training projects. The detailed results of this meta-analysis are 
given in Appendix J under the title, "Testing Test-Taking Skills 
to Elementary Grade Students: A Meta-Analysis." This manuscript 
has been accepted for publication in Elementary School .imirn;^! . 

Other Products 

Finally, during the first part of the project, the scope of 
the proposed research was described and published by Exceptional 
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Chil<jren in the fan of last year and is given in Appendix A under 
the title, "Research in Progress: Improving the Test Faking 
Skills of Learning Disabled and Behaviorally Disordered Elementary 
Students." In addition, during fall 1983, preliminary findings 
were reported at the seventh annual conference of Severe Behavior 
Disorders of Children and Youth in Tempe, Arizona, in a 
presentation entitled, "Training Behaviorally Disordered Children 
to Take Tests." 

Summary 

It Mas the intention of all of the above investigations to 
evaluate both tests and test-taking strategies of mildly 
handicapped students in order to determine the most likely 
strategies for intervention and the form that intervention should 
take. In all, it was determined that mildly handicapped students 
do differ from their nonhandi capped peers with respect to use of 
appropriate strategies on standardized achievement tests. It was 
also determined that these strategy deficits include use of prior 
knowledge, use of deductive reasoning skills, attention to 
appropriate distractors, and selection of strategies appropriate 
to correctly answering different types of terms. Also, the 
reported negative attitudes toward required the development of 
materials which were short in duration, intensive, and positive in 
their approach to test-taking. Finally, the results of the meta- 
analysis provided important ' yrmation concerning optimal sample 
sizes and training length. 
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TRAINING TEST-TAKING SKILLS 

Developme nt and Revision of Training Materials 

Basel upon results of the above investigation and careful 
evaluatiof of the Stanford Achievement Test, materials were 
developed ich were intended to teach to second, third, ^nd 
fourt'- '^/-atia children in special education placements skills 
appropriate to the successful taking of the Stanford Achievement 
Test. These materials included eight scripted lessons and a 
student workbook of exercises on subtests meant to be very similar 
to those used on the Stanford Achievement Test. These materials 
were intended to teach both general test-taking strategies, such 
as efficient time usage, as well as specific lessons means to 
increase understanding of the particular test demands of the 
individual reading subtest of the Stanford Achievement Test. 
These materials are included with the Year 1 Final Report and 
corresponding ERIC Document and are entitled "Super Score" 
(Scruggs & Williams, 1984). 

Following the preliminary development of materials, they were 
pilot-tested on two groups of second grade children with learning 
and behavioral disorders. On the basis of this pilot 
investigation, several revisions were made in the materials. 
Specifically, some of the lessons proved to be too long, and some 
instructions were judged to be ambiguous. In addition, a pre- and 
posttest measure which was developed for use with this population 
was also judged to be inadequate to effectively assess progress 
made op these materials. 
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On the basis of the initial pilot investigation, the 
materials were revised and expanded to include second to fourth 
grades and were then implemented in a larger field test involving 
16 students in special education placements in second and third 
grades (Scruggs & Tolfa, 1984). Students were randomly assigned 
to treatment and control groups at each of the three grade levels, 
and the lessons were administered to the treatment groups. 
Students in the experimental group were seen to score higher than 
students in the control group on a shortened version of the 
Stanford Achievement Test, Word Study Skills subtest. This 
investigation was reported in a manuscript which was published in 
Perceptual and Motor Skills . Appendix K. 

Training Studies 

RQ^dinq tests in the primary oradps. Some final revisions 
were made of the training materials on the basis of the second 
field test, and materials were finally prepared for spring 
implementation immediately prior to district-wide standardized 
test administration. While fir.il revisions were being made, 
individual schools were contacted to be involved in a larger 
experimental study intended to validate these materials. For this 
study, approximately 110 students enrolled in special education 
classes in Grades 2, 3, and 4 in two different large elementary 
schools were selected and randomly assigned to treatment and 
control conditions (Scruggs & Mastropieri, in press). Four 
persons, including the principal investigator, took part in the 
two-week training period which was administered at the end of 
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March. This training was administered in eight 20- to 30-minute 
sessions given from Monday to Thursday for each of two weeks 
immediately prior to district-wide test administra^-,ion. At the 
same time, materials were developed intended to increase test- 
taking skills on the Comprehensive Test of Basic Skills and were 
administered in the school districts adjacent to Utah State 
University (Scruggs, Bennion, & Williams, 1984). This training 
package was implemented in local third grade classes in order to 
determine (a) whether these procedures were appropriate for whole- 
class administration, (b) whether the materials developed for the 
Stanford Achievement Test could be easily adapted to other tests, 
and (c) whether such training could be seen to have an impact upon 
test scores, attitudes, and tin>e-on-task during test 
administration. 

The results of the training on the Comprehensive Test of 
Basic Skills in the local third grade classes indicated that 
student' attitudes had, in fact, qualitatively improved as a 
result of the test training. It was suggested that the test 
training had resulted in a more normal distribution of attitudes 
after the end of the three days of testing and implied that the 
training had made the test-taking experience itself less traumatic 
on the part of third grade regular classroom students (including 
15% mildly handicapped students). Time-on-task during directions 
and during the test-taking experience itself did not seem to be 
affected by the training package. In addition, the training was 
seen to significantly increase the scores of students in the lower 
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half of the class on the Word Attack subtest of the reading test. 
Analysis of the top half, or the group as a whole, was not 
possible due to the presence of strong celing effects in both 
experimental and control groups. This investigation has been 
written in the manuscript form and is given in Appendix L under 
the title, "The Effects of Training in Test-Taking Skills on Test 
Performance, Attitudes, and On-Task Behavior of Elementary School 
Children." 

Results of the training package with second, third, and 
fourth grade special education students also indicated that the 
training was successful In improving scores on standardized 
achievement tests. Although only descriptive differences were 
seen in the reading comprehension subtest, the training package 
significantly improved the performance of the experimental 
students over control students in the Word Study Skills subtest, 
replicating the findings of Scruggs and Tolfa (1985), and Scruggs, 
Bennion, and Williams (1984). This improvement was judged to be 
approximately equivalent to a three- to four-month increase in 
equivalent grade level. The fact that Improvement in the Word 
Study Skills subtest was observed was considered to be due to the 
fact that this particular subtest involved many smaller subtests, 
several format changes (e.g., Tolfa, Scruggs, & Bennion, 1985), 
and potentially confusing directions for which the training 
package was though to have been particularly helpful. Descriptive 
differences were seen In other subtests of the SAT, but, not being 
statistically significant, it Is not possible to determine whether 



they were a result of the training or simply sampling error. 
Evaluation of scores of the second grade students indicated that 
they apparently had not benefited from the training package. 
However, the differentially small number of subjects in the second 
grade sample, attrition suffered during the training, and the fact 
that the two 2nd grade groups were in retrospect found to have 
differed with respect to the previous year's testing, obscure 
clear interpretation of this data. It may be, for example, that 
second grade LD a;-!d BD students have insufficient reading and 
ot^er academic skills to enable them to benefit from this training 
package, or it could be that these students had, in fact, 
benefited, but that due to sampling and attrition problems these 
benefits were not observed. This entire investigation has been 
described in detail and is given in Appendix M under the title, 
"Improving the Test-Taking Skills of Behavi orally Disordered and 
Learning Disabled Children," which has been accepted for 
publication in Exceptional Children . 

Tg$t-train1nq: Upper eleme ntary aradp«; reading and math 
sublfisli. Based upon the results of the pilot testing and the 
results of training from previous investigations, an experimental 
study involving approximately 100 students in special education 
classes in grades four through six was implemented immediately 
prior to the regularly scheduled administration of district-wide 
tests. This training employed five 20- to 30-minute lessons with 
accompanying workbooks, given in Appendix 0. 
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Test scores of experimental and control students were entered 
into a 2 (experimental vs. control) by 2 (LD vs. BD) analysis of 
variance on each of the five trained subtests. Results replicated 
those of previous investigations in that a significant effect was 
found for trained students on the Word Study Skills subtes . 
Trained students scored an average of 9 percentile points higher 
than untrained students, consistent with previous findings, and 
considerably higher than many previous findings with non- 
handicapped students (Scruggs, White, & Bennion, in press). In 
addition, a significant effect favoring trained students was found 
on the Mathematics Concepts subtest. An obtained interaction on 
this subtest indicated that training had exhibited a differential 
effect on behaviorally disordered students. In addition, a 
descriptive but non-significant effect favoring trained students 
was found on the Mathematics Computation subtest. As in previous 
investigations, no effect was found for the Reading Comprehension 
subtest. This investigation is described in detail in the 
manuscript entitled. "The Effects of Coaching on the Standardized 
Test Performance of Learning Disabled and Behaviorally Disordered 
Students," which is given in Appendix P. 

$mpary. Taken together, the results of these experimental 
training studies suggest that learning disabled and behaviorally 
disordered students do exhibit deficits relative to their non- 
disabled peers on test-taking skills, but that these skills are 
trainable. Effect Sizes from these training investigations were 
substantially larger than those previously reported for non- 
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handicapped students (Scruggs, White, & Bennion, in press), 
further supporting a "test-taking skills deficit' on the part of 
learning disabled and behavi orally disordered students (Scruggs & 
Lifson, 1985). The differential problems with test-taking skills 
on the part of non-handicapped students, compared with the often 
exaggerated role of test-taking skills on the part of the non- 
handicapped students, has been described in the article, "Current 
Conceptions of Test-Wiseness: Myths and Realities," and has been 
published in the journal School Psychology Rpvipw (Appendix P). 

— Teacher implementatio n; Iowa TPst. elementary grades . 
This aspect of the project has been less conclusive in the 
findings. In one investigation, 40 special education teachers in 
Mesa, Arizona, were assigned at random to training and control 
conditions. Training condition teachers were given inservice 
instruction in implementation of training materials specifically 
developed for the Iowa Test of Basic Skills, the test was recently 
adopted for use in the State of Arizona. Implementation of the 
procedures proceeded in general as planned, although confident 
reporting of the findings was compromised by several factors. 
First, assignment of teachers to experimental units in such an 
investigation has strong statistical support, but presented 
practical problems, particularly since many teachers worked in the 
same classrooms, and saw many of the same students. An attempt to 
reassign some teachers to groups for practical reasons resulted in 
bias in an unknown direction. Second, the number nf teachers was 
too large to permit careful documentation of the extent of actual 
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treatment fidelity. It was though that a covariance adjustment 
may have compensated somewhat for the assignment problems, as well 
as the substantial amount of variability across the district; 
however, the large amount of missing previous years' test data 
prohibited such an analysis. 

The implementation of this particular procedure did not 
result in observed statistically significant differences between 
training and control groups. The general conclusion made by the 
investigation was that methodological difficulties had obscured 
findings, although of course the findings of "no observed 
differences" may have been a true one. The materials were 
sufficiently popular with teachers, however, thai personnel from 
that district requested similar materials for secondary level 
students. These materials were developed and delivered to the 
district, along with "consumer satisfaction" forms, to be returned 
for evaluation. Because the materials arrived somewhat late for 
use by all teachers, however, results from the consumer 
satisfaction evaluation will not be complete until spring of 1987. 

2. Teacher imnlpmentation; Stanford Achievement Tpst, 
secQndarv ^M^rnt;?. Because of the difficulties associated with 
such a large-scale implementation as that of the previous 
investigation, it was determined that a smaller number of 
teachers, within one school population, should be employed for the 
second teacher administration study. In this investigation, a 
junior high school containing approximately 120 LD and BD students 
in the Salt Lake City area was selected. Two of the four teachers 
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implemented the training on a randomly selected half of the 
students, using materials developed for this investigation 
(Appendix _). Fidelity of implementation was observed regularly. 
It was determined that, although teachers implemented training 
materials in accordance with instructions, considerable 
contamination across experimental conditions was observed. In 
addition, direct training given some control subjects proved to be 
highly similar to experimental group subjects. Evaluation of 
posttest data again revealed no statistically significant 
differences in favor of the experimental group. Although it may 
be true that such training produced little effect on test-taking 
performance of secondary aged, learning disabled and behavi orally 
disordered students, it seemed more probable to the investigators 
that implementation difficulties had again compromised objective 
evaluation of training. 

Summary. Two attempts at evaluating the effectiveness of 
teacher implemtntation of training materials, on the elementary 
level with the low? Test of Basic Skills, and or the secondary 
level with the Stanford Achievement Test, faileo to produce 
observable, statistically significant differences favoring 
training condition students. Although it could be concluded that 
teacher-led training was less likely to result in gain in test- 
taking performance, the investigators felt that such conclusions 
could not be made with confidence. 

Rather, such findings underline the difficulties inherent in 
attempts to implement treatments under so-called "real world" 



24 



19 

conditions. In both cases, unavoidable treatment contamination 
was observed, and exceptions had been made in random assignment to 
treatment conditions. Such problems were not encountered in the 
three previous training experiments in which project personnel 
delivered experimental treatments in settings far removed from the 
students' regular classrooms. However, on at least one occasion 
in these investigations, classroom had to be excluded froii 
analysis because teachers had tampered with random assignment. 
Such problems created difficulties . . om a research point of view, 
but offered practical support that teachers viewed the training as 
beneficial and sometimes attempted to include students on the 
basis of perceived need, rather than on the basis of 
methodological rigor. 

In all, it must be concluded that the results of the teacher 
implementation studies were inconclusive. Taken together with all 
previous investigations, however, it still appears possible to 
conclude that these training materials can result in tangible 
improvement in test-taking performance of learning disabled and 
behavioral ly disordered students. 

Additional Investiq;i|t,i^n'^ 

Concurrent with the implementation and training studies, 
several additional investigations were undertaken to further 
evaluate the role and utility of standardized achievement tests 
with learning disabled and behaviorally disordered students. 
These investigations are described below. 
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Group vs. individually administered tpsts . In a study 
reported by Mastropieri (1986), in Appendix _, a randomly 
selected sample of lea^-ning disabled students' test scores on the 
SAT and the individually administered Woodcock-Johnson Achievement 
test scores were compared. It was determined that, although 
moderate correlations were found between the two tests, the 
Woodcock-Johnson test showed no intercorrelations between reading 
and math subtests, as did the SAT. In addition, the Woodcock- 
Johnson test was found to consistently result in scores up to ten 
percentile points lower than those of the SAT. Basing placement 
decisions on the Woodcock-Johnson, then, would result in the 
identification of a larger number of LD students than would the 
use of the SAT. 

Predictive validity for behavi orallv di«;nrdered student*; 
This study was conducted in cooperation with the Neuropsychiatric 
Institute of the University of Cal ifornia--Los Angeles, and 
evaluated the predictive validity of achievement gain while in the 
Neuropsychiatric Institute on success of future placements 
(Forness, Kavale, Guthrie, Scruggs, & Mastropieri, in press). In 
this study, academic gains (as measured by the CTBS) of 110 
children and adolescents hospitalized for psychiatric disorders 
were examined in relationship to subsequent school outcome, as 
measured by teacher ratings in post discharge classrooms. Effects 
of IQ, severity of diagnostic disorder, and related variables were 
also eki .lined as was type of classroom placement. Although 
achievement gains during hospitalization did not appear to predict 
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outcome ratings, there was an observed relationship between 
initial CTBS math cores and ?cademic outcome. The report of this 
investigation is entitled "Predicting Outcome Through Achievement: 
Academic Gains During Psychiatric Hospitalization as a Measure of 
School Progress at Fol low-Up," in Appendix _. It has been 
accepted for publication in Behavioral Disorders - 
Achievement of LP anri BP Students. Another investigation was 
conducted to examine possible differences between LP and BP 
students on standardized achievement tests (Scruggs & Mastropieri, 
in press). Test scores of over 1,400 LD and BP students on the 
Stanford Achievement Test, all subtests, were evaluated. It was 
concluded that serious academic deficiencies existed in both 
populations. Although some significant differences were found 
between LP and BP populations, these differences were trivial in 
magnitude. The full report is in Appendix _, entitled "Academic 
Characteristics of Behaviorally Pisordered and Learning Pisabled 
Students," and has been accepted for publication in Behavioral 
Pisorders . 

Sgmm^rY. These three investigations provided evidence of the 
use of tests in (a) describing performance, (b) predicting 
placement outcomes, and (c) discriminating between categories of 
exceptionality. It w;- found that results of group and 
individually administered tests may disagree, achievement gains 
may not always predict success of placement outcomes, and LP and 
BP students may not differ in any substantial way on achievement 
test scores, although the reasons for such low scores may differ 
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^ between groups. These studies, taken together, suggest some 

shortcoming in achievement test in defining, discriminating, and 
predicting outcomes of mildly handicapped students. Such 

^ findings, however, are not intended to suggest that achievement 

tests are not important. Rather, they help underline the 
necessity of using multiple measures in addition to achievement 
test scores. 

GENERAL CONCLUSIONS 

^ This three-year research project has involved five school 

districts in three states, literally thousands of ID, BD, and non- 
handicapped students, and scores of school personnel, 
administrators, and project staff. The project to date has 
generated 22 publications, 7 presentations, 5 sets of training 
materials, and several additional manuscripts either currently in 
review or in preparatic . Including the hundreds of individuals 
who have requested reprints, papers, preprints, and other 
information from the project, the total audience can be numbered 
in the tens of thousands. 

The overall conclusions of this project are threefold. 
First, mildly handicapped students exhibit both cognitive and 
affective deficiencies with respect to the taking of standardized 
achievement tests. This represents a substantial problem, as 
mildly i.andicapped students are among the most frequently tested 
groups of children. Generally, cognitive deficits have been 
observed with respect to (a) understanding of test forma' s; (b) 

^ ERIC 



23 

use of test-taking strategies, including mobilization of prior 
knowledge, deductive reasoning, and elimination strategies; and 
(c) clerical aspects of test-taking, i.e., use of separate answer 
sheets. In addition, mildly handicapped students have been 
observed to report consistently more negative attitudes toward 
taking tests. Taken together, these deficiencies seem tangible 
enough to constitute an area of concern in special education. 

Second, many of these deficits are subject to remediation. 
Although specific components of test-taking skills training were 
not separately evaluated, it seems likely that mildly handicapped 
students benefited most from (a) familiarization with potentially 
confusing test formats, (b) instruction i' pecific test-taking 
strategies for specific subtests, (c) practice and feedback in the 
use of separate answer sheets, ar'i (d) development of positive 
attitudes toward test taking. Although experimental outcomes of 
teacher- implemented training were less positive, so were they less 
tightly controlled. Overall conclusions of the project are the 
short and intensive training of test taking skills using task- 
appropriate materials results in tangible improvement in test 
scores of mildly handicapped students. 

Finally, additional evidence has been gathered concerning the 
limitations of achievement tests. It was seen that populations 
referred for learning disabilities and behavior disorders may not 
appear different merely on the basis of achievement test scores. 
This is true even though classification criteria for behavioral 
disorders do not include direct evaluations of academic 
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functioning. Also, it was seen that achievement gain may not 
predict future placement outcomes. Such findings underscore the 
importance of additional measures in special education to improve 
dissemination and prediction. Finally, it was seen that different 
tests may deliver different norm-referenced information for the 
same students. This finding is perhaps the most disturbing, and 
seems to suggest that careful evaluation of tests is necessary for 
making placement decisions. 

Altogether, it is felt by the project director and project 
personnel, as well as the solicited opinions of outside experts in 
the field, that the project has been successful in providing 
important information regarding the administration and 
interpretation of standardized achievement tests with behaviorally 
disordered and learning disabled children. During the course of 
the project, it was also determined that an equally important area 
for research involved the test-taking skills mildly handicapped 
children exhibit on teacher-made, content-area tests. Such tests 
call for a very different set of skills than those which concerned 
the present investigators, and yet it was seen that this is also 
an area of great need in special education. It is hoped that the 
results of the present project can, in addition to reporting its 
own findings, provide encouragement for future research efforts in 
the area of content area tests. Through such efforts, the 
knowledge ' .e in assessment in special education can be greatly 
improved. 
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ERIC 36 



MANUSCRIPTS 



31 



Scruggs, T. E., Bennion, K., & Williams, N. J. (1985). The 
effects of trainin g in tfist-takino skills nn tpst 
performance. a1^titiirtp<:, ^pH »n-task behavior of elpmpntary 
?gh99l chll^r^n. Unpublished manuscript, Utah State 
University, Logan, UT. (Appendix K) 

Scruggs, T. E., & Lifson, S. A. (1984). Are learning 

disabled students 't est-wise?': An inquiry into reading 
comprehension test itPm*:. Manuscript submitted for 
publication. (Appendix E) 

Scruggs, T. E., & Tolfa, D. (1985). Developmental aspects of 
test-wiseness for absurd opti ons: Elementary school 
children. Unpublished manuscript, Utah State University, 
Logan, UT. (Appendix C) 



APPENDIX A 

RESEARCH IN PROGRESS: IMPROVING THE TEST-TAKING SKILLS OF 
LD AND BD ELEMENTARY STUDENTS 



ERIC 38 



RESEARCH IN PROGRESS 



Charles C Cleland 
Department Editor 



Improving the Test-Taking Skills of 
LD and BD Elementary Students 



Principal InvBttigatom Cie Taylor and Thomai 
Scnjigs. Exctptional Child Canter. Utah Stata Uni- 
varsity. 

PurpoMfObfactfw: Tha purpose of this investiga- 
tion is to determine whether reinforcement tech- 
niques and direct training in test-taking skills can 
increase the validity of test scores for learning dis- 
abled (LO) and behaviorally disordered (BD) stu- 
dents. To determine the degree to which LD and BD 
students exhibit inappropriate (inefficient) test-tak- 
ing skills, students are observed and interviewed 
while taking standardiseed tests. Based on those 
observational data, procedures and training pack- 
ages will bt designed to increase student perform- 
ance on standardised achievement tests. If the proce- 
dures and training are elfoctive. educational deci- 
sions» which are frequently based In part on the 
results of standardised achievement tests, will be 
more valid because problems in areas such as test- 
taking skills, student motivation, and confusion due 
to testing format will be reduced or eliminated. 

Su6/tcfs; Suhj' .ini ItM) nlnmimtary ntudents on- 
rolled in 12 resource rooms and self-coritainod class* 
rooms for children with loflrning disabilities and 
behavioral disorders. 

Methods: LD and BD children matched on age. 
handicap, and standardised achievement test score 
will be randomly assigned to experimental and con- 
trol groups. Students in the experimental group will 
receive materials and procedures designed to im- 
prove the ability of handicapped students to take 
tests. Experimental and control groups will be com- 
pared statistically on several measures, including 
attitudes toward test-taking, student and teacher 
behavior during test administration, and actual per- 



formance on standardised tests of reading achieve- 
ment. In following years, materials will be dcvel- 
oped and Implemented for mathematics achie\e- 
ment tests and test-taking skills for secondarv-age 
handicapped students. 

Jlesn/lf fo Dole; Preliminary findings indicate that 
many LD and BD children, as well as low achieving 
nonhandicapped students, do not spontaneously ex- 
hibit efficient test-taking behaviors. Specificaliy. 
handicapped children have been seen to exhibit 
difficulties with item format and dlstractors more 
typical of naive test takers. 

Comifiencefflenf out/ Estimated Completion 
Datet: This investigation began July i, i9b3 and is 
expected to continue for throe years. 

Funding: Funding for this investigation has been 
provided by a grant from the U.S. Department of 
Education. Research in Education of the Handi- 
capped. 

PubUcatlane/Pnditcte Available: Preliminary ma- 
terials for improving test-taking skills, piloted on 
nonhandicapped second-grade students, have been 
developed and will be revised for use with handi* 
capped children during the coming ywr. Manu- 
scripts documenting tho invostigatioii will bo com- 
pleted and submitted for publication during tho 
second half of the academic year. Please write the 
authors for further informstion. 
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ongoing research in the field of specioJ educotion 
thot hos not yet been published. Investigotors 
wishing to report studies in progress ore invited 
to submit o brief synopsis of their efforts to (he 
column editor. Chorles C. Clelond. 3427 Monte 
Visto. Austin TX 78731. Reports ore to be submit- 
ted in trfplicote ond should follow the format 
shown obove. with o moximum length of 500 
words. 
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Much of what constitutes reading instruc- 
tion in today's public schools reflects stu- 
dents' scores on standardized achievement 
tests. Test performance may influence later 
assignment to reading groups, classrooms, 
or remedial or special education pro- 
grams. Although norm-referenced read- 
ing tests have been criticized as ins'^nsitive 
to specific skill deficits and inadequate as 
complete diagnostic measu.es (Howell 
1979). most reading tests have nonetheless 
been shown to be highly reliable and valid 
(Spache 1976). For better or worse, stand- 
ardized reading tests are truly a part of 
education today and will most likely be used 
in the future. 

!f important decisions are to be based 
on the results of standardized reading tests, 
student scores should pi ovide the best pos- 
sible estimate of reading performance. 
Unfortunately, the results of past research 
indicate that reading test performance can 
be influenced by factors other than knowl- 
edge of test content (e.g., Taylor & White 
1982). One of these factors, **test-wise- 
ness" (TW). was first described in detail in 
1965 by Millman, Bishop, and Ebel (p. 707) 
as "a subject's capacity to utilize the char- 
acteristics and formats of the test and/or 
the test-taking situation to receive a high 
score." Millman etal. developed an outline 
of test-wisencss principles, which included 
time-using strategies, error-avoidance 
strategies, guessing strategies, and deduc- 
tive-rej^soning strategies. Slakter. Koehler, 
and Hampton (1970) presented informa- 
tion suggesting that TW has a develop- 
mental component. That is, students may 
become mon **test-wise" as they grow 
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older. Generally, researchers have in- 
ferred extent of TW on the basis of tests 
constructed specifically for this purpose. 

Students themselves were questioned 
recently about strategies they use to an- 
swer test questions. Haney and Scott ( 1 980) 
administered a number of achievement 
tests to I I students, then questioned them 
the following day concerning how they at- 
tempted tc> answer each item. These re- 
searchers developed a complex model in 
which responses to interviewer questions 
\ *re classified into 46 separate categories. 
Most of the.se categories included the u.se 
of some specific str^'itegies .such as gue.ssing, 
elimination of alternatives, or "reason- 
ing.*' I'heir results indicated that children 
use a wide range of strategies in answering 
test questions and that often a child's per- 
ception of item content bears little resem- 
blance to the intentions of the test's au- 
thor. Haney and Scott concluded that 
considerable "ambiguity" exists in .stand- 
ardized test questions, existing to a greater 
extent in science and social studies areas 
and to a lesser extent in reading areas. 

Haney and Scott's work contributed 
significantly to our knowledge of the na- 
ture of ambiguous test items. However, the 
focus of their study was on test construc- 
tion, with implications for the reduction 
of test item ambiguity. Although class- 
room teachers may u.se the results of Ha- 
ney and Scott to improve their own tests, 
published standardized tests cannot be al- 
tered by teachers. A remaining question 
concerns the extent to which students em- 
ploy test-taking strategies when faced with 
difficult or ambiguous items. Do students 
use such strategies spontaneously (that is, 
withou' being trained)? If so, which strat- 
egies (if any) are effective in obtaining cor- 
rect answers? No previous research can be 
located to answer these questions. 

To address these questions in the pres- 
ent study, the reading test performance of 
elementary school children was examined. 
Specifically, two areas were investigated: 
the strategies students spontaneously em- 



ployed to answer reading test items and 
the relati\e cffccMveness of these strate- 
gies III increasing reading test scores. 

Procedure 

A sample multif le-choice reading te«r 
based on items from the Stanford Achieve- 
ment Test (SA'I ) (Madden, Gardner, Rud- 
man, Karlsen, 8c Merw in 1973) was devel- 
oped and piloted on five students to 
evaluate whether the length \Nas appro- 
priate and to establish i eliable scoring con- 
ventions. I'his sample test included items 
from the Word Reading, Readuig Com- 
prehension, Word Study Skills, and Vo- 
cabulary subtests. After revisions had been 
made, it was administered to 31 elemen- 
tary-age Caucasian students (15 girls, 16 
bnys) attending summer clas.ses in a rural 
western area. Students were selected from 
both remedial and "enrichment** clas.ses .so 
a range of abilities was represented. As a.s- 
.sessed by the Woodcock Reading Achieve- 
ment Test (Woodcock 1973), 20 students 
read at or above grade level; 1 I read below 
their grade level. Most students (20) v.ere 
second or third graders, but students were 
also selected f rom Grades 1 (Mvo students), 
4 (tu )), 5 (five), and 6 (two). 

All students were seen individually by 
one of four examiners. One examiner ni- 
terviewed 18 students, whereas the other 
three interviewed two, four, and six stu- 
c^ents. First, students were given the Pas- 
.sage Comprehension subtest from the 
Woodcock Reading Achievement Test in 
order to identify an approximate reading 
comprehension grade equivalent. Stu- 
dents were then given selections from the 
SA I' one year level higher than their as- 
sessed grade level on the Woodcock sub- 
test. In this manner, a. similar difficulty I -vel 
was provided for each student. Most stu- 
dents were able to an.swer correctly ap- 
proximately two-thirds of the test ques- 
tions. 

Students were then told to read aloud 
each test question (as well as the reading 
passages in the Reading Comprehension 



MARCH 1985 



\ 



CHILDRLN S STRATEGY USE 



481 



subtest) and whichever of the distractors 
they chose to read. They were neither en- 
couraged nor discouraged from reading 
each distractor. As soon as students had 
an5^ered a test question, they were asked 
to rate their level of confidence in their 
response: were they very sure, somewhat 
sure, or not sure the answer they had given 
was correct? After students had finished 
each subtest, they were asked to reread the 
questions and tell the examiner why they 
had chosen their answer. The examiner 
recorded reading errors, confidence lev- 
els, attention to distractors, reference to 
reading passages, and reported strategies. 
Sessions were tape-recorded to clarify any 
later ambiguity in scoring. Students spent 
45-90 minutes in the session and answered 
31-42 test questions. Some students re- 
ceived more questions than others because 
different levels of the SAT required dif- 
ferent subtests and formats. 

Results and discussion 
Effectiveness of strategies 

We found all strategy responses could 
be classified within a 1 0-level hierarchy that 
strongly predicted the probability of re- 
sponding c^**rectly. Proportions of correct 
responses were computed across subjects 
for each type of strategy and are shown in 
figure 1. These classifications were as fol- 
lows: {a) skipped (student skipped the item), 
(b) misread a key word in question or dis- 
tractors, (f) used fa'ilty reasoning (exam- 
ple: one student repor «d, 'This word must 
be the correct answer because it has a pe- 
riod after it"), (rf) did not follow directions, 
(r) guessed, (/) "seemed right" (student 
thought the answer was correct without 
being able to state an explicit reason), {g) 
used external information (example: "I 
know most people in fires die from breath- 
ing smoke because a fireman told me that**), 

(h) eliminated inappropriate alternativei, 

(i) referred to passage, and (j) clearly 
••knew" the answer (example: "I know that 
a pear is a kind of fruit"). The existence 
of these strategies indicates that a com- 
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Fig. I.— Percent correct an^^ers by slralcg> 
used. Strategy classifications: 0, skipped item. 1. 
misread keyword; 2, faulty reasoning: 3, did not 
follow directions; 4. "seemed right;" 5, guessed. 6. 
used external evidence; 7, eliminated: 8. referred 
to passage; and 9, clearly •'knew.'* 



plete hierarchy of test-taking skills exists 
beyond simply knowing or not knowing the 
answers, and these strategies can be more 
or less effective on a standardized reading 
test. For example, as seen in figure 1 , when 
students skipped an answer, nothing was 
correct; when they guessed, they got 37% 
correct; when they eliminated alternatives, 
they got 67% correct. Proportions of em- 
ployed strategies are given in table 1 . 

We condensed these strategies into five 
logical categories (skipping, procedural er- 
ror, guessing strategy, deliberate strategy, 
and "knowing") and computed point-bi- 
serial correlations for each subject. The 
median correlation between item score and 
reported strategy was .54 (/; < .01), a cor- 
relation of moderate strength.* No differ- 
ential effects were seen by age, ability level, 
or examiner; although the sample was too 
small to conclusively investigate these pos- 
sibilities. 

Inspecting figure 1 reveals some other 
interesting findings. The high proportion 
of correct scores for guessing is notable. 
Since the number of answer choices vrried 
between subtests and levels, with four 
choices the most common format, the 
probability of responding correctly by 
chance alone was estimated at .28. In fact, 
when students reported guessing, they 
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Table I. Frequencies {F ) and Percent (%) of Strategies Employed 

Strategy level 

0. Skipped item 

1. Misread keyword 

2. Faulty reasoning 

3. Did not follow directions 

4. "Seemed right" 

5. General 

6. Used external evidence 

7. Eliminated 

8. Referred to pass^g<; 

9. Clearly "knew'* 



F 


% 


9 


1.0 


23 


2.6 


38 


43 


7 


8 


92 


105 


127 


14.4 


21 


2.4 


45 


5 1 


59 


67 


458 


52 1 



scored 37% correct. •'Guessing" responses 
scored virtually the same as ••seemed right" 
responses, suggesting that even when stu- 
dents believe they are guessing, they still 
have some idea of what the correct answer 
might be and can use this strategy to ad- 
vantag.\ •'Seemed right" responses were 
common on the vocabulary subtests in 
which students often reported that a par- 
ticula definition sounded correct but were 
otherwise uncertain. Another interesting 
finding is the high proportion of correct 
responses when the studenU reported us- 
ing outside information or experience. Al- 
though content area tests, such as science 
and social studies, directly test outside 
knowledge, reading tests ostensibly are in- 
tended to test nothing besides knowledge 
of the passage's content. Therefore, al- 
though use of outside information should 
not help, students did benefit from the use 
of such information (however, when stu- 
dents referred to the passage, they scored 
even higher). The students' ability to use 
outside information as effectively as they 
did is surprising. This finding underlines 
toe ••passage independence" problems of 
reading comprehension items, a topic well 
investigated by researchers such as Tuin- 
man (1973-74), 

Levtl ofeonfid0n€€ 

Students had a reasonably good idea of 
whether they had answered a test question 
correctly. When students reported being 
"very sure" their answer was correct, they 



were correct 81% of the time. When they 
repoited being ^somewhat sure," they 
were correct only 13% of the time, and 
when they reported being ••not sure," they 
obtained correct answers only 7% of the 
time. However, these figures are some- 
what misleading. The results seem differ- 
ent if looked at another way: when stu- 
dents answered incorrectly, they also 
reported being ••very sure" the answer was 
correct in 56% of the cases. Clearly, al- 
though related to performance, level of 
confidence in itself is not a sufficient check 
on correctness of a student's work. The 
relation between confidence and correct- 
ness of response was seen to vary widely 
from student to student, with a median 
point-biserial correlation of .29 (p > .05). 
Therefore, in many cases, other means are 
necessary for students to assess the cor- 
rectness of their responses. These means 
will be described below. 

The cost of carelessness 

In addition to reported test-taking 
strategies, information was also collected 
on the degree to which the students at- 
tended to distractors and chose their an- 
swers by referring to the reading passage 
on the Reading Comprehension subtest. 
Results showed that students rarely re- 
ferred to the reading passage; even though 
when they did, they stood a very good 
chance of answering the question cor- 
rectly. In 89% of the cases where students 
answered a reading comprehension ques- 
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tion incorrectly, they had not referred to 
the passage that clearly contained the cor- 
rect answer. Of course, this does not mean 
that all of these questions could have been 
answered correctly had students referred 
to the passages, but it does appear that 
reading scores could be greatly improved 
by students' increased attention to the pas- 
sages. 

Similarly, a great deal of carelessness 
was observed in attention to distractors. 
When students answered incorrectly, in 
409^ ol the 302 cases they had not read all 
distraclors. Again, this finding does not 
mean all these que.stions could have been 
answered correctly by greater attention to 
distractors, but students could almost cer- 
tainly have improved their scores by doing 
so. When students answered questions cor- 
rectly, they had attended to all distractors 
in 737t of the 577 cases. It does appear, 
then, that te.st performance can be im- 
proved through greater attention to dis- 
tractors. 

Another surprising finding was the rel- 
atively small effect of reading errors. Al- 
though performance »vas clearly impaired 
when students misread a word of key im- 
portance (see fig. 1), in general misreading 
words was less detrimental than might be 
expected. When students misread one or 
more words in stem or distractor. the pro- 
portion of items answered correctly (58% 
of 293) was still quite high. Clearly, many 
students have developed strategies for cop- 
ing with words they cannot read. It seems 
important to remind students not to **give 
up*' if they c?nnot read every word. As the 
present investigation indicates, students are 
often able to answer correctly even though 
they cannot read every word. 

One final finding concerning careless- 
ness can be reported. All examiners noted 
the extent to which students had acted on 
the wrong stimulus in the **word study 
skills" subtest. In ihis subtest, students are 
given a word with an underlined sound and 
asked to find the same sound in one of 



three distractors. The following problem 
provides an example: 

Prize 
{a) prince 
(A) size 
(0 seven 

The correct answer is /; because the z 
in "size" has the same sound as the under- 
lined z in **pri/e." What was surprising to 
us is that students often attended to the 
wrong stimulus, for example, the initial /;r 
in the above question. Although the exact 
incidence of these errors cannot be given, 
their consistent occurrence seems f o imply 
that teachers should stress the importance 
of attending to the underlined sound only. 

Conclusions 

The results of this study demonstrate that 
students do employ specific strategies to 
cope with test item ambiguity, indecision, 
or lack of knowledge in selecting correct 
answers. The.se findings have important 
implications directly bearing on student 
performance during te.sling. To attain the 
most correct answers, students should em- 
ploy the .strategies listed below: 

1 . Be certain to attend to all distractors 
and refer to the reading pa .sage, even 
if you are "very sure" your answer 
IS correct 

2. If you are having great difficulty 
reading a pas.sage, read the queslion.s 
and try to answer ihem anyway. 
Often, your own knowledge can help 
you choose an answer. If you have 
difficulty with some word.s in the 
question or distractors. answer any- 
way and base your answers on the 
words you can read. 

3. If you have attended to all pans of 
a passage and test question and still 
do not know an answer, there is still 
a good chance of getting the correct 
answer if you guess. 

4. Be certain you are attending to the 
appropriate stimulus, such as the 
underlined sound in a "word study 
skills** subtest. As in other subtests, 
wrong answer choices may look cor- 
rect at first glance. 
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5. Make sure you answer every item. 
Even if you must hurry and guess fre- 
quently near the end, you will prob- 
ably get some of the answers correct. 

Considering the results of past research 
(Bangert, Kulik, & Kulik 1983), it is likely 
that to affect test performance signifi- 
cantly, a teacher will have to do more than 
simply read the above points to students. 
Examples and practice activities will help 
students develop these test-taking skills. 

These findings should be of interest to 
special education teachers, particularly 
tho.se in »a of learning disabilities. 

Many children are referred for special class 
placement on the basis of deficiencies in 
standardized reading-test scores. Special 
education often is quite beneficial to stu- 
dents who clearly need it, but before tak- 
ing such a dramatic step, teachers should 
be certain that the test score reflects the 
best abilities of the student rather than a 
problem with test taking in general. 

The present investigation indicates that 
a range of abilities exists in test-taking skills, 
as it does in other areas. If tests are to be 
as valid as possible, the specific skills ob- 
served in efficient students taking a read- 
ing test should be practiced by all students. 
If test-taking skills are incorporated in 
general test-administration procedures, it 
appears maximum benefit can be derived 
from the use of standardized reading tests. 



Notes 



The authors would like to thank Dr. Ginger 
Rhode and Judy Johnson, as well as Dr. Jay 
Monson, acting director, and the staff of the 
Edith Bowen School, particularly Dorothy Dob- 
son and Lou Anderson, for their valuable as- 
sistance with this project. The authors would 
also like to thank Ursula Pimentel and Marilyn 



Tinnakul for typing the manuscript. Address 
requests for reprints to Thomas E. Scruggs, Ex- 
ceptional Child Center, UMC 68, Utah State 
University, Logan, Uuh, 84322. 

*A point-btserial, rather than a Spearman 
correlation of ranks coefficient, was compute i 
out of concern for the necessarily high number 
of ties resulting in computing a rank correlation 
with binary dau. However, the obtained Spear- 
man coefficient of .55 differed by only one point 
from the obtained point-bi serial coefficient of 
.54. 
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Abstract 

Twenty-eight students from grades 1 through 5 were administered a test of 
test-wiseness for absurd options. Results suggested that a developmental 
trend may exist in test-wiseness for elementary- age school children. 
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Developmental Aspects of Test-Wiseness for Absurd 
Options: Elementary School Children 

First discussed by Thorndike in 1951, test-wiseness (TW) was dv.-scribed 
in detail by Millman, Bishop, and Ebel (1965), and defined as "a subject's 
capacity to utilize the characteristics and formats of the test and/or test- 
taking situation to receive a high score" (p. 707). They further described 
TW as "logically independent of the examinee's knowledge of the subject 
matter for which the items are supposedly measures" (Millman et al., 1965, 
p. 707). Ebel (1965) has suggested that error in measurement is more likely 
to be obtained from students low in test-takirg skills. The student low in 
TW, therefore, may be more of a measurement problem than the student high in 
TW (Slakter, Koehler, & Hampton, 1970b). 

Some investigations have indicated that TW has a developmental 
component; that is, that TW increases with age. Slakter, Koehler, and 
Hampton (1970a) administered a measure of TW to students from grades 5-11 
and found a significant overall linear trend for g^ade level. Crehan, 
Koehler, and Slakter (1974) administered a TW test to students in grades 7 
through 11, and a follow-up test to the same students two years later. 
Increases over all intervals except grades 9 to 11 were found. In a second 
follow-up of the same students, Crehan, Gross, Koehler, and Slakter (1978) 
replicated the previous findings and concluded that although TW increases by 
grade, large individual differences exist within grade levels. 

Although the above investigations provide strong support for a 
developmental component of TW in the secondary grades, as yet no 
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investigation has evaluated the developmental nature of TW in the elementary 
grades. The present investigation is intended to address this question. 

Method 

Subjects were ?8 elementary school -age children attending summer 
classes prior to entering grades 1 through 5 in a western rural community. 
Students (1 first grader, 9 second graders, 11 third graders, 2 fourth 
graders, and 4 fifth graders) were selected from both remedial and 
"enrichment" classes so that a variety of ability levels was sampled. 

Students were seen individually by one of four examiners. First, they 
were administered a five-item test of TW. This test was developed to 
measure the ability of students to eliminate options known to be incorrect 
(corresponding to the Millman et al., 1965 TW category I-D-1, absurd 
options). For example, one of the items was the following: 

Good airplane pilots must be able to , 

quickly in an emergency. 

1. fall asleep 3, sturnate 

2. scream 4. thing 

Students were orally provided with words they were unable to read. Since it 
was thought that evidence of TW would be more subtle in an elementary school 
population than it was in studies of secondary students, some departures 
were made from the procedures of Crehan et al. (1974). First, students were 
directly questioned regarding the reasons for their answer choices following 
completion of the test. Second, students were scored as reporting no 
elimination strategies (0), or reporting one or more strategies (1), 
regardless of the "correctness" of their answer to each test question. 
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Results and Discussion 
A point-biserial correlation was computed between entering grade level 
of student and presence or absence of reported elimination strategies. The 
resulting coefficient, .44, was statistically significant (p < .02) and 
represented a moderate relation between grade level of student and reported 
use of elimination strategies, accounting for approximately 20% of total 
variance. Proportion of students reporting use of elimination strategies by 
grade level is given in Figure 1. 



Insert Figure 1 about here 



Thus, it appears that a developmental trend in one aspect of TW can be 
observed in children of elementary school age, and that this trend is 
similar to that seen in older students. These findings must be interpreted 
with caution, however, due to the limited sample size, as well as the fact 
that only one aspect of TW was measured. Although further research is 
needed, the results of this preliminary investigation suggest that students 
begin to learn TW skills as early as the primary grades, and that these 
skills continue to improve with age. 
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Footnote 



^The author would like to thank Karla Bennion, Steve Lifson, Dr. Jay 
Monson and the staff of the Edith Bowen School for their assistance on this 
project. 
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Figure Caption 

Figure 1. Proportion of students reporting elimination strategies by 
grade level. 
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PASSAGE INDEPENDENCE IN READING ACHIEVEMENT 
TESTS: A FOLLOW-UP^ 

STEVE UFSON. THOMAS E SCRUGGS. AND KARLA BENNION 
Uub Si4a$ UnivfrtUy 

Summsrj. — 38 college undcrgriduates wCrC adminisccred reading compre- 
hensioD items from a major standardized achievement cesc with corresponding 
passages deleted. Analirsis indicated chat, after 20 years of similar research 
findiogs. Highly passage-independent items still occur on major tests. 

For almost 20 years, it has been documented that readini;<omprehcnsion 
test items can be aiiswereu correctly at above-chance rates without actually 
reading the relevant passage (Preston, 1964). Pyrczak (1976) mentions 
several types of items which 5,ccm particularly independent of the passage. 
These types include (a) items that can be answered from the examinee's own 
knowledge and (b) items about a particular passage that are related to each 
other in such a way that some items provide dues for other items. Reading- 
comprehension tests which include such items invite critical attention on the 
grounds that (a) examinees may have an advantage over those not using these 
strategies (Pyrczak, 1972) and (b), if a subjca uses these principles and 
skips passages, he invalidates the purpose of the test (Tuinman, 1973-1974). 
Since an extensive review of the h*terature hus shown no justification for the 
use of passage-independent items, the question arises as to whether these items 
still occur in commonly used standardized achievement tests. Tlie present in- 
vestigation was intended to determine whether such items are still in use. 

Method 

Subjects atid Materials 

Thirty-eight undergraduate elementary education students at a western 
university completed 16 multiple-choice reading-comprehension questions 
without the accompanying passages. The items selected were thought to rep- 
resent questions that could be answered without having reaa the ^ "company ing 
passage. These items were chosen to correspond to Millman, Bishop and l:l)ers 
(1965) categories of test-wiseness strategies involving the general knowledge 
of the test taker and use of subject matter of neighboring items. The specific 
effects of these cues, however, were not addressed in this study The 16 items 
were taken from the Stanford Achievement Test Form E, Level P-:>, from a 
pool of 60 items. The items were kept in clusters illustrating winch l^elonged 
together in terms of association with a particular passage 

*The authors thank Dr. Barnard Hayes for his kind and gcn^rrous assistance with this 
investigation. Requests for reprints should Lvr addressed to Steve Lifson, Exceptional 
Child Center, UMC 68, Utah State University, Lo£an. Utah. 
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Procedure 

The maceritls were distributed to two sections of a class in teaching read- 
ing. The students were told: "Today Im going to give you some reading- 
comprehension test items without the passages. It'^is not expected that you 
will answer all of the questions correctly; just do your best. Guess if you do 
not know the answer/' No time limit was imposed upon the task. 

Results and Discussion 

Analysis indicated that the mean score was 75% correct, with an average 
mean score of 11.9 of the 16 items. A one-sample / test (Hays, 1973) con- 
firmed that the obtained scores were significantly different from chance le- 
sponding (/ = 18.9, < 001). 

Although the items were not randomly selected for this measure, they 
nevertheless represented 25% of the items mcluded in the reading-compre- 
hension section of the test. Clearly, at least some test developers have done 
little to alter passage-independent items in light of the research findings of 
almost two decades. While the effects of the readers previous knowledge 
cannot be eliminated, the effects could be minimized by the use of fictional 
material for the passages with accompanying questions about the activities of 
an imaginary person. In spite of the reported validity of these items (SRA. 
1979), the burden of construct validity rests with the authors of the tests. If 
some students are able to answer "reading-comprehension" test items correctly 
without reading the passage, one can question what is being measured. 
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Abstract 

Prwicus research has indicated that students in many cases can 
answer reading conprehension test questions corcectly without 
havli>g read the acocnparylng passage. The present research 
cotpared, in twD eaqjerlments, the ability of leamir^ disabled 
(LD) students and more typical age peers to answer such reading 
conprehension quastions presented independently of reading 
passages. In Study 1, learning disabled students scored 
ajpreciably lower under conditions resembling stancJarxiLzed 
adininistration procedures. In Study 2, reading decoding ability 
was controlled for; however, the performanoe differ«itial 
remained the same. Results suggested a relative deficiency on 
t±ie part of LD students wiiJi respect to reasoning strategies and 
test-taking skills. In addition, the validity of some tests of 
"reading conprehension" was discussed. 
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Are LeamLng Disabled Students "Test-Wise?": 
An Inquiry into Reading Qxtprehension Test Items 
For maTY years, there has been SGme argument over v*iat 
reading ooiprdiension tests "really" measure (e.g., Thomdike, 
1973-1974). The most oonmonly observed stardardized reading 
ocqprehension itan format ocxisists of a passage and a number of 
associated multiple choice questicns. Readii^ and understanding 
•the passage is assuned to be a necessary pre-condition to 
correctly answering the questicns. After exaninii^ the 
literature, however, one is forced to question the assutption of 
question dependence on the stimulus passage. Preston (1964) 
found that college students were able to answer reading 
oonprehenslon itenie with passages deleted at a rate significa-.tly 
above chance. Tuirman (1973-1974) adndnistered five major tests 
to 9,451 elementary-level students under several conditions. 
Students in the no passage condition (the relevant passage had 
been deleted) on the average achieved only 30% fewer correct 
answere than subjects In the passage-in condition. SlMlar 
results were obtained by Pyrczak (1972; 1974; 1975; 1976) and 
Blckley, Vteaver, and Ford (1968). A more recent study of passage 
Indejwidenoe by Lifson, Scnjggs, and Bennion (1984) revealed that 
passage-independent items are still quite oonmon in elementary 
level achievement tests. CJollege undergraduates were able to 
answer 75%, or almost 12 of 16 questions on the Stanford 
Achievement Test, Level P-3, without reading the as<?ociated 
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passages. Hils scare is aarisicJerably above that e:qpected by 
chance respccxUng. 

Scruggs, Bermicn, and Lifscn (1985) Interviewed elementary 
age students regarding their responses on a reading comprehension 
test. They found -Ihat students often chose their answers based 
i^jon their own prior knowledge, rather than content of the 
reading passage. When students reported using such prior 
Informaticn, they answered correctly in over 60% of the cases. 

Reading oomprehenslon items vAilch are independent of the 
associated passage can be answered on the basis of the following: 
(a) general knowledge, (b) interrelatedness of the questions on a 
particular passage, and (c) faulty Item construction, i.e., keyed 
option Is twice as long or more precisely stated (Pyrczak, 1975). 
In -tJie first two cases, the presence of enough information In the 
question stem to identify the topic is an Ixnportant factor (e.g., 
"Which of -the following statements is NOT true of penguins?"). 
Such a stem may render a question answerable in terms of 
informaticn already available to the ex^.nlnee, and provide clues 
to the answers of related questions about the same passage that 
lack such Infooanatlon in tte stem ( "This passage is about: [a] 
birds of South America, [b] birds of the Antarctic. . .etc. ). 
These cues, »4\ich individuals apply to a testing situation to 
maximize their scores, correspond to Millman, Bishop, and Ebel's 
(1965) criteria of test-taking skills, or "test wlseness." 
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While test oonstnictars may be able to point to high 
vaUdlly coefficients for their reading canprehensK« tests and 
subtests, an littortant quesUon ar^ 

students are equally able to answer questicx« vdtii the above 
„«,tl««d characteristics vdttoit 

groups of students at a relative advantage/disadvantage in 
ability to answer these questions without reading ihe passage. 

Ito answer this question, a group of students classified as 
learning disabled and a group of regular classroa.. students were 
adnlnistered a selection of n«ltiple choice reading carprehensicn 
questions with the relevant passages ranoved. The ccnditicns of 
this e^cperlxnent were neant to reseni>le those of a nonnal 

situatl«^-i.e., students were required to read the questions 
without assistance. This did not pemdt us to determine the 
«rtent tx, which any Observed differ««es bet^ Che reguz^r and 
learning disabled studatts were due to reasoning or variations in 

general knowledge between the t^ groups, or slitply reflected a 
difference In reading ability. To address this issue, a second 
experiment was performed to see if similar differences could be 
found when word reading was ccntrolled for. 

Studty 1 

Method 

gnH- |«yrtfl and Materials 

Subjecte assisted of 67 regular classroan and resource roan 
third grade stud^ts selected fran several elementary schools In 
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a western rural area. Of these students, 52 were regular 
classroon students and 15 were classified as learning disabled by 
P.L. 94-142 and local criteria, vMch included a 40% discrepancy 
between actual and expected perfonnanoe in two areas of acadanic 
functioning. The average graue equivalent of the total readirig 
score of the non-LD students on the Comprehensive Test of Basic 
Skills (CIBS) was 3.4 (SD = .8), while the average CIBS total 
reading score for the LD students was 2.1 (SD = .5). 

Fourteen multiple choice reading ooiprehension questions 
wiUiout iha aooarparxylng passages were selected for this task. 
Items were drawn from the Stanford Achievement Test, Level P-3, 
Form E (1982). Items had been chosen to represent questions 
thougftt by ^ author to be answerable in terms of: (a) the 
general knowledge of the test taker, and (b) the degree to which 
the interrelatedness of the items served as ^ cue to the answers. 
These items were taken from the Lifson, et al. (1984) study, in 
v*iich college students' ability to answer these questions had 
been docuniented. The items were kept in clusters v*iich belonged 
together in terms of association with a particular passage. 
Procedure 

Treatment was administered in regular instructicnal 
grctpings. Materials were passed out and ai: students were told 
that Ihey were about to take a reading test for v*iich would 
not be shown the aoooqpanYlng reading passages, but that they 
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should try their best to answer all questions. No time limit was 

liqposed upon tiie task. 

Results 

The regular classrocm group answi^^red oarrectly approximately 
55% of the questions, for mean score of 7.8 (SD = 1.96). This 
score was significantly above a chanoe score of 3.5 (tCl02] = 
11.27, p < .001). In contrast, the leamir^ disabled students 
answered correctly only 35% of the questions, for a maan score of 
4.9, only sUg^iUy higj»r than chanoe (tt28] = 1.77, ns). Ihe 
obtained score of the non-U) group was significantly higfier than 
the LD group (tt">5] = 4.91, p < .001). 

Discussion 

•nie present findings fAjggest that regular classroom students 
are able to recognize and make use of cues In testing situations 
In order to increase their scores, even when readlr^ passages are 
deleted, and "reading ccnprehension" supposedly cannot be 
measured. Apparently, learning disabled students are not able to 
benefit equally from these cues. Since neither group should have 
scored above chanoe on a reading canprehension test with the 
reading passages deleted, it is possible that a certain amount of 
bias exists against children with learning disabilities on some 
standardized tests of reading corprehension. Students In regular 
classes v*ien unable to read or oti^ervdse obtain meanir^ from 
reading passages are still able to answer correctly caiprehension 
questions. Students with learning disabilities, however, do not 
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sean to have these skills, and maybe twice penalii^ed for a 
reading handic^: ax5e for being less able to read and 
oonprehend the passage, and a second time for being unable to 
"second guess" test questions, as their nonhandicapped peers .^rr 
^Dparently able to do. 

One possible ejqplanation for this discrepancy between 
learning disabled and regular classroom students is that leamir^ 
disabled students are sixiply less able to read (decode) the 
4ua5tions, and for that reason are less able to outguess the 
test. That is, learning disabled studaits are less deficient in 
"test taking skills" than are in readii^ ability. In order 
to address tiiis question^ a second study was designed, in v*iich 
ability to read would be controlled. Althougfi the conditions in 
this e3q)eninent could not parallel those of stan3ac.iLzed test 
procedures, iiiey did allow for an assessment of the extent to 
which differential scores are attributable to generally lower 
reading skills. 

StucJy 2 
Method 

Subjects and Materials 

The 42 subjects vAio participated in this investigation were 
different students drawn from the same population as those of 
ESqperiment 1, and consisted of 27 regular classroom third grade 
students and 15 third grade children classified as leamir^ 
disabled by P.L. 94-142 and local district criteria. Mean grade 
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equivalent for the non-1 naming disabled groi^ (CTBS) total 
reading) was 3.6 (SD = .9), and 1.9 (SD = .4) for the non- 
learning disabled group. Materials vgere 14 Items drawn fran the 
Stanford Achievement Test, level P3, Fom F, and were chosen on 
the sane basis as those used : .i Experiment 1. Pages of the test 
were again left Intact with questions left In the original orxJer 
and the passages themselves blacked out durli^ the copying 
process. 
Procedure 

Students were informed by their teacher that they were about 
to take a reading test without reading the corresponding 
passages. They were told to listen vAille the te cher read each 
Item, and then answer the Items. All students were given 
sufficient time to an&ver all questions. 

Results and Discussion 

Hie students in regular classroans answered correctly 65% of 
the 14 times, for a mean score of 9.14 (SD = 1.8). The learning 
disabled students, on the oilier hand, answered correctly only 45% 
of tlie items, for a mean score of 6.33 (SD = 1.8). Althou^ both 
obtained scores are well above chiance, (t[52] = 12.02, and t[28] 
= 4.325, ps < .001, for Hie regular classroom and learning 
disabled students, respectively), ttvs regular classrocm group 
maintained its aivantage over the learning disabled students, 
t(40) = 4.87, p , .001. The results suggest that learning 
disabled students may be less likeli to ^ly test-taking 
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strategies to reading oGqpr^»ision questions to a degree of 
efficiency similar to their non-disabled counterparts. 

General Discussion 
In Study 1, regular third grade classroom students were seen 
consistently to outscore their learning disabled counterparts on 
a test of xieadlng oonprehension questions with oarresporxHi^ 
passages dele\»d, and adnnlnistered under conditions resembling 
standardized testing procedures. In Study 2, regular class third 
graders again cutscored LD students, under conditions for v*\ich 
reading ability was controlled. The ability of third grade 
children in these cases to score 55% and oo% correctly on 
questions which refer to non-existent passages seems remarkable, 
and brings Into questions the issue of v*iat some tests of 
"reading oonprehension" ai^a really measuring. Such passage 
Independent items have been thoug^it to assess test-taking skil ls 
and in fact have been used as measures of "test-wiseness" (e.g., 
Derby, 1978). Although it is suggested that differences in the 
use of test-taking strategies (such as use of prior knowlec^, 
deductive reasoning, and elimination of lirplausible opW-ons) were 
ros.xaisible for much of the ctoeerved performanow differer ^s, 
other ejq)lanations are possible. Factore such as oral language 
decoding ability, atter.jlonal deficits, and test anxiety may have 
pl^ed a part In Inh^ V "ng performance on Vtmo part of the LD 
students. The role of these other factors in LD test performanoe 
is currently being investigated by the present authcars 'Taylor & 
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Scxuggs, 1983). Whatever such tests maasure, it is clear that: 
(a) it is not "reading oatprehension/' and (b) childrm 
classified as ID are at tfi apparent dis€idvantaqe. 

An argunent can be inad3 -Uiat these ccqpariscns are of 
trivial iitportanoe, since In staiYJardirMd test adninistration, 
passages are not deleted; that all children In fact have equal 
access to passages vMch contain answers to reading conprehension 
questions. Although this argument lias a certain face validity, 
some problems remain. First, since non-LD students can score so 
higji on such items without reading 1J» passages, the extent to 
v*iich scores are a direct measure of "readli^ ccnprehension" 
seems uncertain. Second, sinco nearly all such tests are timed, 
students vd.th inoonplute understanding of relevant passages but 
possessing an ability to "outguess" test questions under time 
constraints, clearly are at an advantage vdth resfpect to students 
not possessing such an ability. In this c^se, differences in 
scores on reading ooipnehension tests mgy In fact reflect in part 
a bias toward studarxts with superior ability to respottJ to 
specific cues in the test-taking situation. As has been seen In 
the present experiments, LD students may well find themselves on 
the negative sidr* my such bias. 

Two steps may be taken to help alleviate this potential 
source of bias. First, achievement tests should be rB\d.sed so 
that reading conprehension tepts directly assess ccqprehensicn of 
the provided passage. In fact, an informal review by the present 
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authors of the major achievement tests indicates that many 
achievement test questions appear to be much less "passage 
indepewtent" since the work of Tuliman (1973-1974) and others of 
a decade ago. Seccnc, it seems possible that at least some of 
these "tes -taking skills" can be trained, and that this training 
may do much to correct this ^parent disadvantage. The authors 
are at present investigating the effectivaiess of such training 
(Taylor & Scniggs, 1983). Although such inproved scores en tests 
may not necessarily reflect Increased achievement, these scores 
could reflect more accurately achievement gains students have 
made, as evaluated by standardized achievement tests. 
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LEARNING DISABLED STUDENTS* 
SPONTANEOUS USE OF 
TEST-TAKING SKILLS ON 
READING ACHIEVEMENT TESTS 



Thomas E. Scruggs, Karta Bennlon, and Steven Ufson 



Abstract. The present Investigation was undertaken to Identify the type of 
strategies learning disabled (LD) students employ on standardized, group- 
administered achievement test Items. Of particular Interest was level of strategy 
effectiveness and possible differences In strategy use between LD and nondlsabled 
students. Students attending resource rooms and regular third-grade classes were 
administered Items from reading achievement tests and Interviewed concerning 
the strategies they had employed In answering the questions and their level of con- 
fidence In each answer. Results Indicated that (a) LD students were less likely to 
report use of appropriate strategies on Inferential questions, (b) LD students were 
less likely to attend carefully to specific format demands, and (c) LD students 
reported Inappropriately high levels of confidence. 



Since the seminal article by Millman, Bishop, 
and Ebel in 1965. attention has been focused on 
test-taking skills, or test-wiseness. as a source of 
measurement error in group-administered 
achievement tests (Sarnacki. 1979). Defined as 
"a subject's capacity to utilize the characteristics 
and furmats of the lest and/cr the test-taking 
situation to receive a high score*' (Millman et al.. 
1965. p 707). test-wiseness is said to include 
such diverse components as guessing, time-use. 
and deductive reasoning strategies. Given that 
the effective use of such strategies may have little 
relationship to a particular academic content 
area, individuals or groups of individuals lacking 
in these skills may be at a disadvantage. A 
recently completed meta-analysis, for example, 
suggested that under certain circumstances, low- 
SES students are more likely to benefit from 
achievement test coaching ihan liigher SES 
students — a finding which implies that the 
former group of students are relatively deficient 
in test-taking skills (Scniggs. Bennion, & White, 
in press). 

The present investigation was concerned with 



learning disabled (LD) children's spontaneous 
use of such strategies Part of a larger investiga- 
tion involving test-taking skills of exceptional 
students (Taylor 8l Scruggs. 1983). this study 
was conducted to identify possible deficits in test- 
taking skills on the part of LD children. Such 
deficits, if uncovered, would be helpful in 
developing remediation techniques. 

Although much research has been conducted 
on nonhandicapped popi lations* test-taking 
skills (see Bangert-Drowns. Kulik. & Kulik. 
1983. Sarnacki. 1979. and Scruggs et al.. in 
press, for revieovs). little is known about LD 
students' test-taking skills. Scruggs and Lifson 
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(1985) recently investigated LD students' dif- 
ferential ability to answer passage-independent 
reading-comprehension test items (i.e., reading- 
comprehension test items for which relevant 
passages had been omitted). Items were taken 
from standardized achievement tests known 
from previous research findings to be answerable 
by individuals who had not read the associated 
passage (Lifson & Scruggs, 1984), and thought 
to be good measures of test-wiseness. In two 
expenments nonhandicapped children scored 
55% and 65% correct, respectively, on such 
•terns, whereas LD students from the same grade 
scored much lower, even when word reading 
ability was controlled. Scruggs and Lifson (1985) 
argued that such findings also raised the ques- 
tion of what reading-comprehension tests do 
m<;asure since no reading-comprehension test 
items should be answerable without prior 
reading of the associated passage. Scruggs and 
Lifson concluded that LD children may be at a 
relative disadvantage with respect to such test- 
taking skills as guessing, elimination, and deduc- 
tive reasoning strategies. 

Scruggs, Bennion, and Lifson (1985) 
employed interview techniques to determine the 
nature of the strategies elementary-school 
children spontaneously produced on reading - 
achievement tests. Students representing a wide 
rar^c of age and ability levels were given 
reading-achievement test items appropriate to 
their individual reading levels. Results indicated 
that students employed a wide range of 
strategies far beyond simply knowing or not 
knowing the answer, and that the use of these 
strategies was stron<]ly predictive of perfor- 
mance. These findings provided valuable 
general information about the manner in which 
children respond to reading-achievement test 
items. However, the diversity of the population 
in age and achievement level was thought to 
have obscured observation of specific differences 
in test-taking skills between age or ability levels. 

The present investigation, therefore, was in- 
tended to determine whether LD and nondis- 
abled students differed in strategy use on read- 
ing-achievement tests. In this study, grade level 
was held constant and the number of subtests 
was reduced to two: a reading-comprehension 
subtest in which direct referring, elimination, and 
deductive reasoning strategies were thought to 
be important and a letter-sound subtest in which 



close attention to format demands was con- 
sidered essential. In addition, since level of 
reported confidence had been found to be a 
strong predictor of performance (Scruggs et al., 
1985) and a prerequisite to strategy monitoring, 
confidence reports were examined for possible 
differences between ability groups. 

Subjects •^^'^ 

Subjects were 32 third-grade students attend- 
ing public schools in a Western university com- 
munity. Twelve subjects were classified as LD 
according to local school distnct criteria, which 
induced a 40% discrepancy between ability and 
performance in two academic areas and com- 
pliance with PL 94- 142 regulations Twenty sub- 
jects were regular-class students, ncne of whom 
had been referred for special services or were 
considered by their teachers to be functioning at 
the highest achievement levels. Although the LD 
and regular-class students attended different 
schools, the schools were adjacent, drawing 
their populations from the same middle-class 
community. None of the students qualified for 
their schools' free lunch program. General 
cognitive ability appeared to be similar for the 
two groups. Mean full-scale IQ for the LD 
students (Wechsler Intelligence Scale for 
Children'Reutsed) was 92.75 (30 = 5.7). Mean 
Cognitive Skills Index for the non-LD students 
(Tesr of Cognitiue Shils) was 96. 16 (SD = 9.5) ' 
Mean grade equivalent for reading comprehen* 
sion on the Comprehenstue Test of Basic Skills 
(CTBS) for non-LD subjects was 3.9 (SD= .89). 
equivalent to a percentile score of 6 1 . For LD 
students the mean CTBS reading-compreh*>n- 
sion grade equivalent was 2 3 (SD= 29), 
equivalent to a percentile score of 21 The 16 
boys and 16 girls constituting the sample »vere 
8-9 years old and Caucasian Sex was evenly 
represented in both subject groups 
Materials 

Two reading tests ware constructed from items 
taken from the Stanford Achteuement Test Test 
Items were drawn from the Primary 2 battery for 
the mstrumant used with the LD group, whereas 
the Intermediate 1 level served as the source for 
the regular classroom group. Each test contained 
three reading passages with 14 dependent ques- 
tions (10 content. 4 inference) on each form. 
Comprehension questions were left in their 
original order in relation to the selected passage. 
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Questions were renumbered to avoid gaps 
where passages did not follow the sequential 
order of the original test. In addition, three items 
from the letter-sound test (level P3) were 
selected. These consisted of a stimulus word in 
which a letter or letters were underlined to repre- 
sent a sound that the student was to identify 
among three options given below the stimulus 
word. These items were selected to include a 
distractor that closely matched the initial con- 
sonants of the stimulus word. For example, in 
the item: 
blind 

0 blink 

0 nibble 

0 leaned 

leaned is the correct answer, since it contains the 
same sound as the underlined nd in the stem; 
blink is the distractor, since it contains the same 
initial consonant blend. 
Procedure 

Subjects, seen individually by one of two ex- 
aminers, were asked to read the passages and 
questions aloud and mark the answers they 
thought were correct. Students were then told 
tha^ they would be asked to state if they were 
sure/not sure that the selected answer was cor- 
rect, and the manner in which they had chosen 
the particular answer. Subjects* responses to the 



questions, "How did you choose that answer?" 
and **Are you sure or not sure of your answer?" 
were recorded verbatim on the protocol. Words 
the experimenters had previously deemed 
essential to answering the questions (key words) 
were marked In the examiner's copy of the In- 
strument, and errors in these words were noted 
as the child read a\ovi. 
Scoring 

Test items 'ere scored for correctness, con- 
fidence in answer (sure/not sure), and type of 
strategy reported. Two students from the non- 
LD group, who had misread more than 25% of 
the key words, were excluded from further 
analysis. The responses were divided into seven 
categories: 

1 = Didn't know 

2 = Guessed 

3 = External source of knowledge (e.g.. *1 

know all fish have scales") 

4 = Refenred to passage (e.g., "I read it") 

5 = Quoted directly (e.g., *Mtsays here that...") 

6 = Eliminated options known to be incorrect 

7 = Other reasoning (e.g. , "it said comforted in 

the story. That sort of means relieved.") 
Each response was evaluated in terms of the 
seven categories. Percent of scoring agreement 
was assessed at 100% after the examiners 
scored 25% of each other's protocols. 
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Figure 2. Proportion correct by strategy used on inferential items. 



RESULTS 

Results of a t-test applied to percentage of key 
words read incorrectly indicated that the groups 
did not differ significantly with respect to re ding 
difficulty, t{29) = .37. p > .20 Overall, LD 
students misread 6.6% of 30 key words, 
whereas non-LD students misread 6 75% of 29 
key words. 

Proportion correct by collapsed strategy group 
(inappropriate = strategies 1-3. referring = 
strategies 4-5, reasoning = strategies 6-7) was 
computed for item type and student group (see 
Figures I ^) 

Strategy data were scored for appropriateness 
of reported strategy. Strategies were considered 
appropriate if students reported referring to the 
passage on a recall question (strategy 4 or 5), or 
if they reported a reasoning strategy in response 
to an inferential question (strategy 6 or 7). Pro- 
portion of appropriate respon^^es was then 
entered into a 2 group (LD vs non-LD) by 2 
item type (direct recall or inferential) analysis of 
variance (ANOVA) with repeated measures on 
the item-type variable. Because of the unequal 
group frequencies, a least-squares method of 
analysis (Winer, 1971) was employed. Signifi- 



cant differences were found for item type. 
F(1.29) = 9.19. p < .01, and interaction. 
F(1.27) = 7 58. p < .05. Figure 3 depicts 
graphically the interaction effect Although both 
LD and non-LD students reported a high pro- 
portion of refernng-to-text strategies on recall 
questions (89% vs. 77%. respectively), large 
differences emerged in the proportion of reason- 
ing strategies applied to inferential questions 
(39% vs. 70%. respectively). Nonsignificant dif- 
ferences were observed for overall group means. 
F(1.29) = 1.54 

Analysis of confidence reports revealed that 
both groups were siinllai with respect to reportpd 
confidence level on referring-to-passage 
strategies, with LD students reporting confidence 
in 85% of the cases and non-LD students claim- 
ing confidence in 92% of the instances. These 
reports were similar to actual performance, with 
correct scores of 81% and 86% on these items 
for LD and non-LD groups, respectively, On 
reasoning strategies, however, a different picture 
emerged. Here regular-class students were cor- 
rect on 83% of the inferential items, b'lt reported 
confidence on an average of 71% of the items. 
The LD students, on the other hand, reported 
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being confident on an average of 95% of the 
cases, while being correct in only 63Sd of these 
cases. 

Items on the letter-sound subtest were scored 
for responses which suggested attention to an in- 
a'ppropriate distractor. This inappropriate 
distractor took the form of an initial consonant 
blend present in the stem, but not underlined. A 
comparison of the number of inappropriate 
distractors by group revealed significant dif- 
ferences. t{28) = 2.47. p < .05. Thus. LD 
students chose fhe inappropriate distractor in 
52% of the cases, compared to the non-LD 
children who selected the inappropriate distrac- 
tor in only 24% of the cases. 

DISCUSSION 

With reading ability controlled for. the pre:>''nt 
sample of LD third graders differed from their 
regular-class counterparts with respect to (a) pro- 
portion of appropriate reasoning strategies 
reported for inferential comprehension ques- 
tions, (b) performance and confidence level for 
items in which reasoning strategies had been 
reported, and (c) choice of an inappropriate 
distractor on a letter-sound test. However. LD 



students did not differ from their nondisabled 
peen in terms of appropriate strategy use on 
recall items. Generally, this sample of LO 
children was seen (a) to report fewer reasoning 
strategies, when appropriate, on reading- 
comprehension test items than their regular-class 
counterparts, and (b) to be less successful on 
those items for which they reported using 
reasoning strategies. These results support those 
reported by Scruggs and Lifson (1985). who 
found that LD students exhibited relatively in- 
ferior performance on a test of selected reading- 
comprehension test items for which the relevant 
passages had been removed, and for which 
reasoning strategies were thought to be 
necessary m order to answer (he items correctly. 

The present finding of inappropriately high 
confidence levels exhibited by the LD students 
on items for which reasoning strategies had been 
applied supports a theory of a developmental 
deficit in metacognitive abilities (e.g.. Torgesen. 
1977). as inappropriately high confidence levels 
in taslc performance are often seen in younger 
children. Such a deficit on the part of LD 
children is thought to be critical, since ability to 
evaluate accurately a chosen response is a prere- 
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quisite for effective test-taking. 

LD students' tendency to attend to an inap- 
propriate distractor may be a function of an at- 
tentional deficit (Knipski, 1980) on test format 
much as a defidt in phonetic skills. It is unclear 
whether these test-taking skills are subject to 
remediation (Taylor & Scruggs, 1983); 
regardless, they may reflect a source of measure- 
ment error (Millman et al, 1965). 

Reading comprehension seems to resist 
precise analysis and continues to be the subject 
of many theoretical orientations (Spiro, Bruce, 
& Brewer. 1980). If recall and Inference are 
looked upon as two components of reading 
comprehension, however, the results of the pre- 
sent investigation suggest that LD children 
demonstrate strategy and performance deficits 
on inference questions, but not on recall ques- 
tions, with reading ability controlled for. Thus, it 
may be argued that the specific deficits exhibited 
here reflect problems in reading comprehension 
rather than test-taking skills. It seems likely, 
therefore, that strategy training in such areas 
could lead to improved reading comprehension 
as well as improved test-taking skills, particularly 
since selecting and implementing appropriate 
strategies has been found to improve general 
cognitive functioning (e.g., Torgesen & Kail. 
1980). In the word-study skills subtest, however, 
the LD students apparently became confused by 
specific format demands, which likely had little 
to do with the content being tested (i.e.. match- 
ing an initial consonant blend rather than an 
underlined vowel sound) . Training for this type 
of strategy deficit, therefore, cannot be expected 
to bring about a concomitant increase in 
phonetic analysis skills. 

Replication is necessary to further support and 
refine these fi.idings. The present results sug- 
gested that LD children may benefit from train- 
ing in (a) attending to specific format dema ids, 
(b) identifying inference questions, and (c) 
selecting and applying appropriate strategies 
relevant to inference questions. 
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Abstract Seventy-stx third- and /ourth-grade children classified as learning disabled or 
behoviora/ly disordered were randamly assigned to treotment ond control groups. Students 
ossigned to the treotment condition ivere to ighf test-toking skills pertinent to rcadin** ochieve- 
ment tests. Students ivere tought^in smoJI groups over o 2-iveek period.*in such strategies os 
ottendmg to oppropriote stimuli, morking ons^rers corefuJfy. using tirne well, and avoiding 
errors Fallowing the training procedures, students were odministe.-ec' stondordized ochieve- 
ment tests in their normol cJossroom ossignments. Results mdicoted thot trained students 
scored significantly higher on the Word Study SkiJJs subtest of the Stonford Achievement Test 
Scores on the Reoding Comprehension subtest were not affected by training The relevance of 
these findings to ossessment in special educotion is discussed 



■ Successful ormance in school is to a 
great extent dependent upon the application of 
effective learning and problem-solving strate- 
gies to academic tas^s. Students are often 
called upon to meet particular format and task 
demands of academic assignments w. h effec- 
tive strategies for dealinp with these tasks and 
completing them successfully. Much of the 
failure of learning disabled students in school- 
related tasks has been attributed to a lack of 
ability in applying such problem-solving strat- 
egies (Reid & Hresko. 1980). A body of litera- 
ture has been established in recent years that 
documents the difficulties of learning disabled 
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students in employing appropriate learning 
and problem-solving strategies in school. Par- 
ticular deficits have been noted in the areas of 
(a) attending to the critical components of a 
task (Atkinson & Seunath. 1973: Hallahan & 
Reeve. 1980: Hallahan. Kauffman. & Ball. 1973: 
Ross. 1976. Tafver. Hallahan. Kauffman. & 
Ball. 1976). (b) selecting a strategy appropriate 
to addressing a particular academic lask 
(Mastropieri. Scruggs. & Levin. 1985: 
Torgesen, 1977; Torgesen & Goldman. 1977). 
and (c) effectively employing appropriate 
problem-solving strategies (Hallahan. 1975: 
Spring & Capps. 1974; Torgesen. Murphy. & 
Ivey. 1979). 

Given these documented deficiencies, it 
would appear that one area of particular diffi- 
culty for learning disabled and perhaps other 
mildly handicapped children would be the 
attentional and problem-solving strategies nec- 
essary for successful completion of standard- 
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ized achievement tests. In these group-admin- 
istered tests, learners are typically expected to 
function individually in large-group situa- 
tions, effectively deal with time constraints, 
and develop and employ strategies specifically 
suited to answering questions that may be 
ambiguous or to which the answers are often 
not completely luiown (Haney & Scott. 1980). 
Some recent research wiiH teaming disabled 
(LD) students indicates that these students do. 
in fact» exhibit deficiencies with respect to use 
of effective strategies in standardized test- 
taking situations. 

Scrugi(S and Lifson (1985) administered 
questions from standardized reading compre- 
hension tests to LO and non-LD students with- 
out providing the accompanying reading pas- 
sages. Their results indicated that, although 
non-LO studeiUs were able to answer most 
^'reading comprehension" questions without 
reading the accompanying passages* LO stu- 
dents were less successful. This investigation 
reiterated previously asked questions concern- 
ing winat reading comprehension test" actuaHly 
measure, and also suggested that many LD 
students may lack some specific test-taking 
strategies, such as effective use of partial 
and/or prior knowledge, error avoidance, and 
elimination strategies. 

Drawing upon a previous investigation with 
mostly nondisabled children (Scruggs. 
Bennion. & Lifson. 1985a). Scruggs. Bennion. 
and Lifson (1985b) recently interviewed learn- 
ing disabled and nondisabled children with 
respect to the manner in which they had inter- 
preted and answeied reading achievement test 
Items. Analysis of these strategy reports indi- 
cated that (a) LD students were less likely to 
select and utilize strategies appropriate to dif- 
ferent types of test questions, and (b) LD stu- 
dents were more likely to be negatively influ- 
enced by misleading distractors. Such results 
suggested that learning disabled and pemaps 
other mildly handicapped populations may 
have more difficulty than other students adapt- 
ing to the specific task and format demands of 
standardized achievement tests and. conse< 
quently. resulting scores may be less valid 
estimations of potential performance tnan 
those of other students. 

Although any observed deficit in 
•Mest-taking strategies'* on the part of mildly 
handicapped children would be expected to be 
representative of more glottal problem-solving 



strategy deficits in school-related tasks on the 
whole, it may be possible that specific training 
in test-taking skills may be particularly bene- 
ficial to children referred for learning and/or 
behavior problems. Scruggs. Bennion. & Lifson 
(1985b) hypothesized that, due to differences 
in format and strategy demands, strategies ap- 
propriate for word analysis subtests may be 
more easily trained than strategies appropriate 
for reading comprehension subtests. 

Previous attempts have been made to im- 
prove achievement test scores in regular class- 
rooms by coaching in test-taking skills, but the 
results have been somewhat mixed and seem 
to have had a differential effect on different 
populations. Scruggs. Bennion. and White (in 
press), in a recent meta-analysis, reported that 
students from the primary grade levels and 
students from low socioeconomic backgrounds 
tended to differentially benefit from extended 
training in test-takin^^ skills. This finding do*)s 
suggest that mildly handicapped students may 
also benefit from instruction in some of the 
critical skills they apparently lack when con- 
fronted with standardized achievement tests 

Scruggs and Tolfa (1985) recently reported 
the training of test-taking skills to a small 
sample of LD children. After eight training 
sessions had been completed, experimental 
and control students were administered a re- 
duced version of the Stanford Achievement 
Test (SAT) reading subtests. Results indicated 
that the experimental students scored signifi- 
cantly higher on the shortened SAT subtests 
Although these results are encouraging, sev- 
eral questions remain. First, could a larger 
group of mildly handicapped children, mclud- 
ing behaviorally disordered (BD) students, be 
shown to gain from such training' Second, 
would this training transfer to a standardized 
administration of the SAT' Finally, if this 
training could be shown to be successful, it 
would be interesting to know the actual size of ^ 
the effect in percentile points, so that an esti- 
mate of the practical importance of the treat- 
ment could be made. It was the purpose of the 
present investigation to address these issues 

METHOD 
Subjects 

Subjects were 76 third- and fourth-grade stu- 
dents attending resource rooms or self- 
contained classes in a large western metropol- 
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itan school district. A group of second-grade 
LD and BO students was initially intended for 
inclusion in this study, but was dropped due 
to methodological problems involving sample 
selection and subject attrition. Of the subjects 
40 students were third graders and 36 were 
attending fourth grade classes: 54 were boys 
and 22 were girls. Reading achievement test 
data are given in the "Results** section. In all. 
50 students were classified as BD and 26 as LO 
according io federal, state, and local school 
district criteria. For behavioral disorders, the 
definition included students whose behavioral 
or emotional functioning over time adversely 
a^ected educational performance and required 
special education service. For learning disabil- 
ities, thb definition included a 40% discrep- 
ancy between ability and achievement. Al- 
though specific academic deficiencies w^re not 
criteria for BD classification, a separate analy- 
sis of achievement scores of LD and BD chil- 
dren ir this particular district indicated that 
differences in academic achievement between 
the two groups were trivial (Scruggs & 
Mastropieri. in press). Eighteen students were 
enrolled in self-contained classes, and 58 stu- 
dents were attending resource rooms. Subjects 
were stratified by grade level and randomly 
assigned to experimental and control groups, 
without regard to category of exceptionality. 

Materials 

Materials were developed as part of a larger 
project involving improving test-taking skills 
of LD and BD elementary students (Taylor & 
Scruggs. 1983) and consisted of eight scripted 
lessons for each grade level in a direct instruc- 
tion format and accompanying workbooks for 
students, which included pencil-and-paper 
practice activities (exact materials used are 
given in Scruggs & Williams. 1985). The gen- 
eral test-taking strategies taught in these mate- 
rials included attending to directions, marking 
answers carefully, choosing the best answer 
carefully, using error avoidance strategies, and 
deciding appropriate situations for soliciting 
teacher attention. In addition, specific test- 
taking strategies were taught for each reading 
subtest in the Stanford Achievement Test. 
These included structured practice in specific 
test formats for each subtest and specific appli- 
cation o' general test-taking strategies to each 
specific subtest. For example, with respect to 



the letter-sound subtest, students were taught 
to employ the following sequence of strategies: 

1. Read the first word. 

2. Pronounce to yourself and think of the 
sound of the underlined letter. 

3. Carefully look at all the answer choices and 
choose the word with the same sound as the 
underlined letter. 

4. If you don*t know all the words, read the 
words you do know or read parts cf indi- 
vidual words that you may know. 

5. If you are not sure of the answer, see if there 
are some answers that you are sure are not 
correct, and eliminate those. 

6. Color in the answer quick, dark, and inside 
the line. 

7. Guess if you are not sure. ncvGi skip an 
answer. 

Procedure 

Experimental subjects were taught by four 
trained experimenter^ in small groups ranging 
from one to five in size. Four 20- to 30-miiiute 
lessons were given per week for 2 weeks 
Positive responding and attention to task were 
reinforced with stickers. Immediately prior to 
the training sessions, and immediately after 
the last training session, students were admin- 
istered a criterion test of the skills that were 
taught. This was a 10-item test of test-taking 
skills including questions about time using, 
question asking, and elimiration strategies 
The first seven sessio"** taugut the use of test- 
taking strategies withu. the specific context of 
each of the reading-relaidd subtests. The la.st 
session consisted of a genera! review of all 
previous procedures. Each day of instruction 
involved extensive work with practice activi- 
ties applied to practice test items 

At no time during this training procedure 
were subjects taught any information concern- 
ing the content of the test that was not already 
given in the published test directions Within 5 
days of completion of the training sessions, 
students were administered the Stanford 
Achievement Test. This administration was 
done in the regulai* or self-contained classroom 
settings by their regularly assigned teachers. 
Although teachers were aware of the member* 
ship of each student in the experimeatal 
group, response protocols were scored by ma* 
chine. 
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FIGURE 1. Group by luh^ct intvractioii. 



Results 

Pre- and posttests of the experimental students 
on the criterion measure were compared sta- 
tistically by means of a correlated t-test. It was 
found that the performance on the posttest was 
significantly higher than pretest scores (p < 
.01 ). Students scored an average of 40% correct 
on the pretest, and 77% correct on the posttest. 

Eight students (5 experimental and 3 con- 
trol) did not complete either or both subtests of 
the SAT and were excluded from further anal- 
ysis. Experimental students scored an average 
of the 25.3 percentile (SO = 20.0) on the Word 
Study Skills Subtest and the 16.8 percentile 
(SO - 15.0) on the Reading Comprehension 
Subtest of the SAT. Control s'»Hjects scored an 
average of the 17.4 percentile on Word Study 
Skills (SO = 18.3) and the 16.4 percentile (SO 
= 15.0) on Reading Comprehension. Student 
percentile scores were entered into a 2 (group) 
X 2 (subtest) analysis of variance (ANOVA). 
with repeated measures on the subtest variable 
(Winer. 1971). which yielded significance on 
subtests, F(1.66) = 4.96. p < .03. and jroup x 
subtest interaction. F(1.66) = 7.06. r < .01. 
The main overall effect by group was not 
statistically significant. F(1.66) = 1.21. p < .30. 
Analysis of simple effects (Winer. 1971) indi- 
cated that experimental and control students 
differed significantly with respect to the Word 
Study Skills subtest. t(66) = 2.07. p < .05. but 
not the Reading Comprehension subtest t(66) 
= -.15, p > .20. The group x subtest interac- 
tion is depicted graphically in Figure 1. 



DISCUSSION 

The analysis of pre- and posttest scores indi- 
cated that test-taking skills could be success- 
fully taught to this sample of th. and fourth- 



grade mildly handicapped children, The fact 
that significant gains were made in these crit- 
ical skills suggests that mildly handicapped 
children at this age level do lack certain test- 
taking skills which are potentially useful ia 
taking standardized <*f*-hievement ^€ois. 

Analysis of the te; t '^ata indiijated that train- 
ing in test-taking skills did significantly in- 
crease scores on the Word Study Skills Subtest 
of the Stanford Achievement Test for this sam- 
ple of mildly handicapped s udents. The over- 
all effect size for this investigation. .20. is twice 
as large as the mean effect size found for 
similar investigations with elemontary-school* 
aged nonhandicapped children (Scruggs. 
Bennion. and White, in press), but similar to 
that obtained for primary grade students under 
conditions of extended training (for this age 
group, an effect size of .10 is equivalent to 
aproximately one month of acade»nic achieve- 
ment). The effect size of .43 for the Word Study 
Skills subtest is comparable to the mean effect 
size found for children of low socioeconomic 
status (SES) under conditions of extended 
training, tut much higher than mean effect 
sizes found for higher SES children, or lower 
SES children with shorter training periods 
(Scruggs. Bennion. & White, in press). 

As predicted by recent resea>'ch (Si^ruggs. 
Bennion. & Lifson. 1985b). performance was 
increased on the Word Study Skills subtest 
and not the Reading Comprehension subtest. 
The fact that the Word Study Skills subtest was 
increased significantly may be a function of the 
fact that this particular subtest involves many 
format changes over a short period of time, and 
thus was more amenable to increased perform- 
ance through guided practice and feedback on 
successful skills necessary for completion of 
the subtest. Strategy deficits previously ob- 
served on the Reading Comprehension subtest, 
however, wore not thought to be easily reme- 
diable These deficits included ineffective use 
of deductive reasoning strategies, inability to' 
distinguish between recall and inferential 
questions, and inappropriate levels of confi- 
dence in answer choices (Scruggs. Bennion. & 
Lifson. 1985b) 

The finding of positive training effects repli- 
cates that of Scruggs and Tolfa (1985). and 
extends it to a larger population representing 
different categories of exceptionality on a stan- 
dardized test administration. Although the 
present results are encouraging, several ques- 
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tions remain. First, studenu in this investiga- 
tion were trained by project personnel in order 
to insure fidelity of treatment. The extent to 
which teacher implementation would affect 
the results is not known. An argument can be 
made that, since control subjects did not re- 
ceived 'placebo' treatment (i.e.« noninstruction- 
al contact with the experimenters for an equiv- 
alent trial period), the observed effects may be 
due to a reaction to the novelty of experi- 
menter contact and not the training procedure. 
A decision was made not to deliver placebo 
training to the control group so that control 
subjects would have received additional 
teacher-led mstruction as the comparison 
treatment, and so that their instructional time 
would not have been wasted on nnneducation- 
al treatments. Furthermore, the "novelty" ar- 
gument seems untenable because (a) a recent 
meta-analysis by the present authors indicated 
that such subtle treatments were highly un- 
likely to raise lest scores, and (b) such an 
argument does not explain why only one, and 
not both, subtest scores were raised. 

The overall sample size, the fact that sub- 
lects wer; not stratified by category of excep- 
tionalit* , and 'he disproportionally small 
number of LO students in the present sample 
did not ai'ow sufficient power (Cohen. 1969) to 
separately assess the effects for LD vs BO 
students, although it may be interesting to do 
so in future research. Also, it is not certain 
which training procedures wt^re most respon- 
sible for the observed effects, it is likely, how- 
ever, that training in strategies needed for 
meeting specific format demands was more 
beneficial than the training given in general 
test-taking strategies (e.g., time-using strate- 
gies), for the reason that a different effect was 
observed on the two subtests. Finally, the 
extent to which such training can benefit dif- 
ferent grade levels and content areas (such as 
math) remains to be seen. The present authors 
axe currently investigating such possibilities 
(Taylor & Scruggs, 1983). 

The usefulness of standardized achievement 
tests in special education has been, and re- 
mains, a controversial issue (see Salvia & Ys- 
seldyke, 1981) not intended to be addressed by 
the results of the present investigation. It must 
be considered, however, that the observed ef- 
fect (that of raising mean scores from the 17th 
to the 25th percentile) could be sufficient to 
prevent special education referral for some 



students in schools where such test scores are 
weighted heavily. The present authors do not 
subscribe to the notion tb'^t special educa- 
tional services are undesirable, and that stu- 
dents should be "saved" from them whenever 
possible. It is our view that referral for special 
education services is a serious procedure 
which must take into account many different 
considerations, both qualitative and quantita- 
tive, and for which the ultimate goal must be 
optimal educational service delivery for the 
individual child. If standardized achievement 
tests are to be used for this purpose, then it is 
important that the score obtained be as nearly 
as possible a reflection of the child's knowl- 
edge of the content area being assessed. (In 
fact, a question has been raised concerning to 
what extent any assessment data are used for 
making placement decisions. See Ysseldyke, 
Algozzine, Richey, & Graden, 1982. for d dis- 
cussion of this issue.) 

To this end, training in test-taking skills may 
be useful. There are other ends, however, 
which we feel ought to be considered in such 
training. Since the skills trained in the present 
inve.stigation apparently did transfer to a stan- 
dardized test situ«(ion, it seems likely that 
similar training may generalize to other related 
tasks, for example, for older students, taking a 
driver's test or an aptitude test related to a 
specific employment opportunity 

Finally, test taking can be viewed simply as 
a common task in todays* schools, but not a 
particularly pleasant experience to a mildly 
handicapped student \ ho typically performs 
poorly, or who does not fully understand tes - 
ing conventions and formats. In this ca'e, 
training in test-taking skills could be regarded 
as another means to improve the ability of the 
individual child to function in the outside 
world, a goal to which all special educators 
aspire. 
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ATTITUDES OF BEHAVIORALLY DISORDERED 
STUDENTS TOWARD TESTS^ 

THOMAS E SCRUGGS. MARGO A. MASTROPIERI. 
DEBRA TOLFA AND VESNA JENKIMS 
Utah State Univtrsity 

Summary. — In rwo snidies. attitudes reporrcd coward testing by behavior- 
ally disordered students and their regular classroom counterparts were com- 
pared. In Study 1. 12 behaviorally disordered and 25 average fifth and sixth 
graders were given a survey regarding their attitude toward tests and the rest- 
raking experience. Students classified as behaviorally di^rdered reported less 
posirive attitudes toward tests than their more average peers; these attitude 
differences were more pronounced on irems which reflected subjcaive attitudes 
toward the rest-caking situation tnd aspirations abour ))erformance and less 
pronounced on evaluation of the value of rests. In Study 2. which employed a 
sample of 25 behaviorally disordeted and 23 regular cla^'room students marched 
on age and sex and used a longer attitude measure, differences were nor found. 
Taken together, these snidies suggest that atti Tides toward rests are inconsisrent 
in the two populations and that some behaviorally disordered students may nor 
differ so much in this regard as supposed. 

Students classified as having behavioral disorders have often been said to 
exhibit deficiencies in academic performance f*s measured by standardized 
achievement tests (Motto & Wilkins, 1968; Stjne & Rowley, 1964). Kauff- 
man ( 1981) has reviewed several studies which examined the academic achieve- 
ment characteristics of behaviorally disordered students and concluded that 
often the performance of these students falls far below their potential. Bases 
of these academic deficits are not completely understood, but it is commonly 
thought that behavioral disorders exhibited by this population have a negative 
effect OD academic achievement. It is possible, however, that other factors also 
play a role in the generally lower functioning of behaviorally disordered stu- 
dents. One of these faaors may be a possible difference in attitude toward 
the evaluation process, particularly as evidenced by achievement tests. Since 
no data document possible differences in attitudes toward tefrs and the test- 
taking situation, the present pilot investigation was intended to provide infor- 
mation on whether behaviorally disordered students may differ from their more 
average peers with respect to attitudes with which they approach the test-taking 
situation. Results of such an investigation would not be expected to indicate 
causal relations between attitudes and test performance but might be of value 
to researchers interested in differences in characteristic performance on achieve- 
ment tests between behaviorally disordered and more average students. 

*The research described here was supported in parr by a grant from rhe Departmenr of 
Educarion. Special Educarion Programs. No. G0083000O8. The authors thank Ms. 
Girhy Smirh, Coordinator of Special Education, Hillview Elementary School, Salt Lake 
City. Utah, for her as>isrance with this projca. Address requests for reprints to Thomas 
E Scruggs, Ph.D.. UMC 68. Utah State University. Logan. Utah 84322. 
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Study I 

Method 

Subjects were 37 fifth «nd sixth grade studei.ts attending a public scliool in a 
western metropolitan community. Twelve ot these students had beer. classif:cd as be- 
haviorally disordered, and 25 were more typical fifth and sixth graders attending regular 
classes in the same school. The principal criteria for identification as bcltaviorally dis- 
ordered were aveu-^e ability coupled with social or .motional functiomni; substantially 
different from that ordinarily shown by some other students and supported by tenchers 
ami psychologists' observations and reports Identification as bc!..wior.iIly disordered 
occurred after less intensive educational and psyclioloAical .ntcrvenrions had ndt reme- 
di«ed the observed deficiencies. All 12 behaviorally disordered students were attending: 
a self-contained class in the same school as the mote aver^.qe fif:l. and sixth i;raders 
The two groups were evenly distributed with respect :o n^ade. the sample of more 
average students contained 12 fifth and 13 sixth grader., while the iK-haviorallv disordered 
s.imple contained 6 fifth and 6 sixth graders 

The I2.item Test Attitude Sutvey was consttucred as parr of a Inrgcr investigation 
involving the test-taking skills of learning disabled and beliav:ornlly disordered students 
(Taylor & Scruggs. 1983) and contained such items as 'taking a test Iwhcrs me. 
"it is impoftant for me to do well on a test." and "tests are unfair" "Yes" or "no 
responses indicating aggrecment or disagreement with the associated statement were 
solicited for each statement Intetnal consistency of this survey had been reported as 
78 (Kuder-Richardson 20) on a previous administration to regular class elementary 
school students, indicatiirg a moderate level of reliability for a survey of this nature 
Students wete given the sutvey during regulat classes and wrote an answer to each 
question as the teacher tead each item aloud. Students were given 1 point for a positive 
response (i.e.. "yes" to a positive statement, ot "no" to a negative statement) and 0 
points fot a negative tcsponse. Tests were scoted by independent scorers unaware of 
group membership 
ResuUs 

Tlie reliability of the survey for the present sample was 76 (KR-20). 
which was consistent with previous reports. Comparison of total scores for tiic 
two groups indicated that the average group of students had scored more posi- 
tively than the behaviorally disordered group. The regular firth and sixth 
graders reported 63% positive responses (M = 7.6, SD ^ \^), while the 
behaviorally disordered students reported 47% positive re., jnscs {M — 5 6. 
SD = 2.4), a statistically significant difference (tr, = 2 SO, /» < 01 ) 

In a supplementary analysis, factor analysis of responses for he grorp as 
a whole yielded three factors with tigcnvalucs greater than 1 00, which ac- 
counted for 67.5% of rotal test variance. A principal components analysis, 
using Kaisers criterion for factor limitation, Ts m the diagonal, and varin.ax 
rotation (SPSS, 1903) v'eldcd factors of person- < feelings about tests (eg, 
"taking a rest makes me upset"), |>ersonal importance ot tests (eg. '*Jt is inv 
portanr for me to do well on a test"), and evaluation of ihc uorth of tcsrs 
(eg., "tests are unfair"). Items which loaded most highly on cadi factor were 
compared between rhe rwo groups by means of / tests Hie two groups again 
differed on the first factor, subjective feelmgs about tests (/.r, ~ 2 34, /) < 
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.025), and Factor 2, sub|ective importance of tests (/as = 2.46, p < .02); 
the two groups did not differ with respect to the third faaor, evaluation of the 
value of tests (hn = 84, p > .05). 
Discussion 

Present results suggest that this sample of behaviorally disordered children 
differed from their peers in attitudes expressed toward tests and the te^t-taking 
situation. Although the two groups did nor appear to be different with respea 
to evaluation of the role of tests, they did differ in their personal feelings about 
tests These findings seem to suggest that, although the present sample of 
behaviorally disordered students appeared to appreciate the worth or impor- 
tance of tests, they reported much less positive personal feelings about tests 

Several issues, liowever, can be raised which preclude drawing conclusions 
from the present findings First, the sample of behaviorally disordered students 
is of insufficient size to permit generalizations to a larger population or further 
subdivision, e.g., by sex. Second, the attitude measure had too few items to 
draw firm conclusions regarding subtest performance. Study 2, then, was con- 
ducted to (a) confirm the present findings on a larger sample of behaviorally 
disordered students and (b) expand the attitude survey to contain more sub- 
test items. 

Study 2 

Method 

Subjects were 75 regular clusroom students representing Grades 3 to 6 in a western 
metropolitan public school, and ^5 students attending self-contained classes for students 
with behavioral disorders. Grades 3 to 6, in tlie same school. A different test attitude 
survey was construaed to include two subtests of items suggested by the factor analysis 
of Study 1* Items which reflected feeling about self in a testing situation (e.g., **I 
feel good when ! take a test") and (b) items which reflected feelings about the value 
of tests themselves (eg>, "Tests help the teacher to sec what wc know") Tins instru- 
ment had been piloted on a different sample of 55 elementary school students. Assess- 
ment of reliability gave a lCR-20 of .74 for 22 items, and two Subtests a and b, above 
correlated weakly with each other (.It). Tliis low correlation suggested that separate 
aspecrs of testing attitudes were being assessed. 

llie 22-item measure wa* then administered to the sample of behaviorally disonlercd 
students and their peers in the students* regular classrooms hems >\ere rcac to the 
students by their teachers 
Rejf/hs 

Reliability (FGR.-20) of the attitude measure v/as .75 Reli.ihihty of the 
subtest of "personal feelings" items was /A, while reliability of the "value of 
te.sts*' subtest was >59 Because the two groups differed in distribution <^f a/^e 
and sex, 25 subjects were dtawn from the peer group whicli were m.itchcd with 
the behaviorally disorciered students on these variables The resuliini; samples 
were virtually equivalent with respect to age (1260 mo vs 125 9 mo fot 
behaviorally disorderec and regular class, respectively) and sev dtsttibution 
(21 members of each group were boys) 
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Analysis of attitude responses indicated thar groups did not differ with 
respea to total score, score on "personal feeling" items, or score on "value of 
tests" items (|/| < 1.00 in all cases). On total items, scores for behavioral!/ 
disordered and regular classroom students were, respectively, 16.5 (SD = 4.4), 
and 15.9 (SD = 3.1) out of a possible 22 positive rcsp<^nses. For "personal 
feelings" Items, scores were, in the same order, 10.3 (SD z= 2.9) and 9.8 (SD 
= 1.9) out of a possible 13 positive responses. For 'value of tests" items 
scores were 6.2 (SD = 2.0) and 6.1 (SD = 1.6) out of 9 possible positive 
responses Althoui^h a further breakdovn by sex might have been interesting;, 
the small number of girls in each group would not permit this 

General Discussion 
In Study 1, a small sample of behaviorally disordered students reported 
less positive attitudes toward tests than did their regular class peers. These 
differences appeared to reflect differences in personal feelings regarding the 
testing situation rather than attitudes concerning the utility and value of tests 
in general, although the number of items was too small for conclusions to be 
drawn. In Study 2. a larger sample of behaviorally disordered and regular stu- 
dents matched on sex and age did not differ with respect to reported personal 
feelings alx)ut tests, attitudes concerning the value of testr, or toial attitude. 
Although subjects reflected several different grade levels, attitudes by grade 
level could not be assessed due to the potential confoundmg of grade level by 
classroom 

One possible reason for the discrepancy between Studies 1 and 2 is that 
the subjects in Study 1 were not for one reason or another, representative of a 
larger population of behaviorally disordered students. Another possibility, and 
one worthy of further investigation, is that the discrepant findings reflect the 
fact that Study 2 was conducted during the beginning of the school year, when 
attitudes arc commonly thought to be higher, while Study 1 was conducted at 
the end of the previous year after students had recently experienced testing 
Further research is necessary to assess this hypothesis At present, however, it 
may be concluded that some behaviorally disordered children might not differ 
so much from those in regular classrooms with respect to attitudes toward 
renting as might he thought 
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FORMAT CHANGES IN READING ACHIEVEMENT TESTS: 
IMPLICATIONS FOR LEARNING DISABLED STUDENTS 

DEBRA TOLFA. THOMAS E. SCRUGGS, AND KARLA BENNION 
Utah State Umvtrsity 

It has been seen that chtldren*s scores on reading achievement tests vary not only 
With knowledge of content » but also with the differing formats of test items. Teachers 
working with learning disabled children or children with attention problems may wish 
(o choose standardized tests with fewer, rather than more, format changes. The pres- 
ent study evaluated the number of format and direction changes across tests and grade 
levels of the major elementary standardized reading achievement tests. The number 
of format changes vanes from one change every 1.2 muiutes on the Metropolitan 
Achievement Test Level El to one change every 21.3 minutes on the PI level of the 
Stanford Achievement Test. Teachers may wish to take this evaluation into account 
when considering use of standardized reading achievement tests for their students 

Tlic validity of group-administered achievement tests for learning disabled and 
remedial sludents has been questioned (Benson & Crocker, 1979). A score on a science 
test, for example, should reflect the student's knowledge of the content area and not 
be dependent on reading ability. It is important, therefore, for the test maker to recognize 
bias related to such reading material and to remove that bias (Benson & Crocker, 1979). 
Another potential source of bias has been identified as test formats and format changes 
(Carcclli & While, 1981). In one study of reading achievement, children's responses to 
test items of the same content, presented in different formats, varied from 45 to 92*7o 
correct (White, Carcclli, & Taylor, I98I). Although standardization procedures can com- 
pensate in pan for the influence of lest formats, it is important that a student's score 
reflect, as accurately as possible, his/her knowledge of the content being tested. 

Children in grades lower than the fourth have attained significantly lower test scores 
when the major format change of using a separate answer sheet is introduced (Cashen 
& Ramscycr, 1969; Harcourl, Brace, & Jovanovich, 1973; Ramseyer & Cashen, 1971). 
The skill of completing the separate answer sheet appears to be developmental in nature. 
While first and second graders do not spontaneously or after training use separate answer 
sheets eflicienlly (Ramseyer & Cashen, 1971), third graders have been successfully trained 
ill the use of separate answer sheets (McKcc, 1967). 

Learning disabled children, children with attention problems, and children func- 
tioning below grade level may be even more adversely affected by format changes. 
Scruggs, Bennion, and Lifson (in press), in a study conducted with third-grade learning 
disabled students, demonstrated that LD students were more easily confused and 
distracted by novel formats. These novel formats include the use of separate answer 
sheets. Most standardized tests begin use of separate answer sheets in fourth grade; the 
fifth-grade LD student, functioning two years behind, ma: also expedience diflTiculty with 
this task (Scruggs & Tolfa, 1985). Scruggs and Tolfa hav ^ demonstrated that fourth- 
grade LD students do perform less accurately and with less speed on separate answer 
sheets than do their normally functioning peers. 

Given the extent to which different formats inhibit correct responding, and the lesser 
ability of children al earlier developmental stages as well as the learning disabled stu- 
dent an;* poor reader to adjust to major formal changes, teachers of such students may 

Reprint requests should be sent to Thomas E Scruggs, Exccpuoiial Child Cenicr. Utah Siaie University. 
UMC 68. Logan, UT 84332 
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Table 1 

Format Change Information 



Tcsi 


Level 


Grade 


n 

Minuies 


n 

Format 


« Mmuies/ 
Formal 
Change 


Format 
Changes 


n Minutes/ 
Formit 
Chtnge 


CAT 


10 


K O-K 9 


116 


7 


16 7 


7 


16.6 




11 


K 6-1 9 


57 


8 


5 2 


11 


7.1 




12 


1 6-2.9 


69 


9 


5.8 


12 


7.7 




13 


2.60.9 


69 


11 


2 8 


24 


6.3 




14-19 


3.6-7.9 


45 


5 


7.5 


6 


9 


Mean/ 








/8 


/7 6 


/12 


/9.3 


CTBS 


A 


K.O-K 9 


53 


5 


8 8 


6 


10.6 




K 6-1 6 


45 


5 


5 6 


8 


9 






C 


1 0-1 9 


65 


6 


7 2 


9 


10.8 




D 


1 6-2 9 


64 


8 


7 1 


9 


8 




E 


2 6-3 9 


70 


8 


7 8 


9 


88 




1 


3 6-4 9 


69 


9 


6 3 


11 


7.7 


Mean/ 


c; 


4 6-6 0 


60 


9 

/7 


5 5 

/6 9 


6.7 

/9 


/8.8 


ITBS 




1.7^2 6 


68 


10 


3 8 


18 


6.8 




8 


2 7-3 5 


68 


12 


2 3 


40 


5.7 




9-M 


3-7 


57 


3 


14 3 


4 


19 


Mean/ 








/8 


/6 8 


/17 


/10.5 
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MAT 


PI 


1 5-2 4 


45 


3 


15 0 


3 


15 




P2 


2 5-3 4 


40 


2 


3 3 


12 


20 




Et 


3.5-4 9 


40 


3 


1 2 


33 


13.3 




Int 


5 0-6.9 


40 


3 


2 4 


17 


13.3 


Mean/ 








/3 


/5.5 


/16 


/15.4 


SAT 


PI 


1.5-2.9 


85 


4 


21.3 


4 


21.3 




P2 


2 5-3.9 


90 


8 


6.0 


n 25 






P3 


3 5-4.9 


80 


9 


67 


12 


8.9 




11 


4 5-5.9 


85 


8 


3 1 


27 


10.6 




12 


5 5-7 9 


85 


8 


26 


32 


10 6 


Mean/ 








n 


/8 


/18 


/i2.5 


SRA 


A 


K.6-1 5 


97 


6 


13 9 


7 


16 2 




B 


I 6-2 5 


115 


7 


16 4 


7 


16.4 




C 


2 6-3 5 


85 


6 


14 2 


6 


14.2 




D 


3 6-4 5 


48 


3 


160 


3 


16 




L 


4 6-6 5 


50 


4 


12 5 


4 


12.5 




1 


6 6-8 1 


50 


4 


12 5 


4 


12.5 


Mean/ 








/5 


/U 3 


/5 2 


/14.6 
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wish to consider using reading achievement tests with less frequent (rather than more 
frequent) format changes. Teachers will prefer to use tests on which a student's scores 
are affected more by knowledge of content than by the ability to adjust quickly to for- 
mat changes. 

Teachers, however, do not often have the opportunity to alter district decisions 
on which standardized tests are administered. In such situations, training may be 
beneficial. Scruggs and Mastropieri (in press) demonstrated that BD and LD students 
could be successfully trained in test-taking skills involved with format changes. Scruggs 
and Mastropieri found that the more complicated the formats, the greater were the train- 
ing gains. Since format has been shown to be a variable influencing test performance, 
the present investigation intended to compare the number of format changes, across 
grade levels, of the major standardized reading achievement tests. Levels from 
kindergarten to seventh grade were included. 

Method 

Procedure 

Reading subtests of the following standardized tests were analyzed for format 
changes: the Stanford Achievement Test (SAT) levels Primary I, Primary 2, Primary 
3, Intermediate I, and Intermediate 2; the California Achievement Tests (CAT) levels 
10-17; the Metropolitan Achievement Tests (MAT) levels Primary I, Primary 2, and 
Elementary and Intermediate; the Iowa Tests of Basic Skills (ITBS) levels 7-13; the Com- 
prehcTiive Tests of Basic Skills (CTBS) levels A-G; and the SRA Achievement Series 
levels \-¥. 

A format change was defined as a variation in the number of options per item, 
a change from column to row or row to column, a change in either any part of the 
item itself or options from word to picture to passage to question to cloze item. Com- 
parisons across tests and grade levels were made by dividing the time allowed by the 
number of formats in the test. For example, 20 minutes/4 formats means that, in this 
case, there is a format change every 5 minutes. Interrater agreement was established 
at 100<7o by two raters discussing and recoding any independent disagreements in coding. 

Results and Discussion 

Format information specific to each individual test is presented in Table I. The 
standardized test with the least number of formats is the Metropolitan Achievement 
Test, which has an average of 3 formats across levels. The standardized test with the 
least number of format changes is the SRA, which has an average of 6 format changes, 
or one change every 13-16 minutes. The tests with the greatest number of formats are 
the California Achievement Test and the Iowa Tests of Basic Skills, both of which have 
an aveiagc of 8 formats. The standardized test with the greatest number of format 
changes is the Stanford Achievement Test, which has an average of 18 format changes, 
with level 12 showing 32 format changes, or a change every 2.6 minutes. 

The mean of the format changes across grade levels vanes from one change every 
6.1 minutes at grades 2-3 to one change every 12.75 minutes at grades K. 

Children's test scores vary not only with knowledge of content, but also with the 
differing formats of test itemj>. Teachers rf children with learning or attentional difliculties 
may wish to consider various options to help ensure that all possible bias is eliminated 
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from standardized tests. Teachers and school districts should consider using standard- 
ized tests with the lower ni.mbers of format changes. When it is not possible to change 
tests administered, the richer should provide practice and training with difficult for- 
mats. In addition, if a teacher suspects that students have difficulty adjusting to new 
formats, s/he may prefer to use a test that allows a reasonable amount of time before 
switching to a different format. The number of format changes on the major standard- 
ized reading achievement tests varies from I change every L2 minutes on the Metropolitan 
Achievement Test to / change every 2L3 minutes on the Stanford Achievement Test. 

Although the teacher should always exhibit caution when interpreting test results, 
extra care should be taken when problems with format changes are suspected. 
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Abstract 

Results of 24 studies which investigated the effects of training 
elementary school children in test-taking skills on standardized 
achievement tests were analyzed using meta-analysis techniques. 
In contrast to previous revijws, the results of this analysis 
suggest that training in test-taking skills has only a very small 
effect on students* scores on standardized achievement tests. 
Lrtrger effects were noted for outcomes such as self-concept and 
attitude toward tesis, but these were based en a very small number 
of studies, longer training programs appear to be more effective, 
particularly for students in grades 1-3, and for students from low 
socioeconomic status background. Results from previous reviews of 
this body of literature are critiqued and explanations offered as 
to why the results of the present investigation are somewhat 
contradictory to previous reviewers* conclusions. Suggestions for 
further research are given. 
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Teaching Test-Taking Skills to Elementary 
Grade Students: A Meta-Analysis 
In recent years there has been increasing interest in 
teaching "test-taking skills" to students. Training materials 
have been developed (e.g., Mini-Tests, 1979 and Test-Taking Skills 

Kit, 1980), and claims f ave bem made that such training leads to 
increased test scores (e.g., Bangert-Drowns, Kulik, & Kulik, 1983; 
Fueyo, 1977; Jones & Ligon, 1981; Samson, 1184). Such training 
programs are advocated primarily because achievement test results 
are often used to assist in making decisions about educational 
placement, programming, and evaluation. To the degree that 
achieveirent tests are measuring Ust-taking skills rather than 
mastery of the content oeing tested (e.g., reading, math), 
decisions about plarement, programming, and evaluation may be 
incorrect (see Ebel, 1965, for additional discussion). Promoters 
of teaching test-taking skills have claimed that students would 
obtain higher and more valid scores if deficiencies in test-taking 
skills were remediated (Ford, 1973; Fueyo, 1977; White & Taylor, 
1982). 

Although efforts to reduce measurement error in standardized 
achievement testing are commendable, several questions remain: 

1. Although many people have concluded that test-taking 
skills training leads to increased test scores, is that position 
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consistently supported empirically, and what is the magnitude of 
typically obtained effects? 

2. Are some types of training more effective than others, 
and are some groups of children more likely than others to benefit 
from such training? 

The present investigation analyzed the results of previous 
studies which had examined the effects of teaching test-taking 
skills to elementary school children. Similar to recent reviews 
of the "test-taking skills" literature (e.g., Bangert-Drowns, et 
al., 1983; Taylor, 1981), studies were included only if they 
explicitly attempted to teach test-taking skills (e.g., pacing 
strategies, format familiarization, deductive reasoning) as 
opposed to assessing the effect of repeated test administration 
(i.e., a "practice effect") or tutoring in the specific content 
areas. Such test-taking skills training programs are referred to 
by various names (e.g., coaching, test-wi seness , test-taking), but 
all have a common goal of improving test scores by teaching skills 
and strategies about how to take standardized tests^. Although 
children's scores on standardized achievement tests were the inost 
frequently examined outcome of such studies, the effects of 
training on outcomes such as attitude toward tests, anxiety, and 
appropriate test-taking behaviors were examined frequently enough 
to also be included in our analysis. 
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Review of Previous Work 
^ Several previous reviewers have examined the effects of 

teaching test-taking skills (Bangert-Drowns, Kulik, & Kulik, 1983; 

Ford, 1973; Fueyo, 1977; Jones & Ligon, 1981; Sarnacki , 1979; 
0 Taylor, 1981). Although none of these reviewers foe sed 

exclusively on the effects of teaching test-taking to elementary 

school children, the findings, procedures, and the specific 
^ studies included in these reviews are closely enough related that 

the results of those reviews are instructive for the current 

study. A summary of the characteristics and conclusions of these 
# reviewers is shown in Table 1. 



Insert Table 1 abou*" here 

• 

All of these reviews included studies which examined the 

effects on standardized achievement tests with elementary school 
^ children, although only Bangert-Drowns et al. (1983) and T^^ylor 

(1981) reported the results separately for elementary school 
children or distinguished between achievement and aptit*»de tests. 
% All previous reviewers concluded that teaching test-taking skills 

resulted in substantially higher test scores. Unfortunately, 
except for Bangert-Drowns et al. (1983) and Taylor (1981), 
^ previous reviews failed to indicate the procedures or criteria for 

including research studies in their review, did not cite and 
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critique prior reviews, and apparently only analyzed results of 
the primary research included in their review in terms of the 
original researcher's conclusions. As will be shown below, all of 
the reviewers failed to include a substantial number of studies 
with elementary aged children that were available at the time the 

review was done. Consequently, there are questions about whether 
the results cited in any of the reviews are representative of 
available research. Conclusions about the magnitude of the effect 
attributable to test-taking skills training is also unclear since 
most of the reviewers stated only that differences were found, or 
improvement was noted, and occasionally referred to statistically 
significant differences between groups. Without knowing more 
about the magnitude of the effect attributed to teaching test- 
taking skills, it is difficult to draw conclusions about whether 
it is wise to divert resources from other activities (e.g., 
teaching reading) to teach test-taking skills. 

The most comprehensive analysis to date of the effect on 
achievement test scores of teaching test-taking skills was a meta- 
analysis recently completed by Bangert-Drowns et al. (1983). 
Because of its apparent comprehensiveness and recency, a more 
detailed crituqie of this review is presented below to both 
justify the need for, and to lay the foundation for, the current 
investigation. Bangert-Drowns et al. analyzed the effect of 
teaching test-taking skills to elementary- and secondary-aged 
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children by computing a standardized mean difference effect size 
for each study (Glass, 1977). This was a substantial improvement 
from most earlier reviews which relied primarily on authors' 
conclusions or tests of statistical significance without 
indicating the magnitude of effects (the exception to this was 
Taylor [1981]; however, her analysis focused primarily on IQ tests 
and/or non-elementary-aged populations and, consequently, is not 
as relevant to the current investigation). Knowing the magnitude 
of improvement is very important so that practitioners can make 
judgnents concerning whether the investment in training can be 
justified compared to what else could have been accomplished 
during that time. Bangert-Drowns et al . (1983) concluded that 
teaching test-taking skills raised standardized achievement test 
scores by .25 standard deviations--enough to raise the typical 
student from the 50th to the 60th percentile. They also concluded 
that length of training program was positively related to effect 
size; drill and practice was less effective than training in 
"broad cognitive skills;" and effects of training were essentially 
the same for elementary- and secondary-aged subjects, and was not 
affected by identifiable subject characteristics or other 
characteristics of the program. 

Although Bangert-Drowns et al. provided valuable information 
about the effects of teaching test-taking skills to elementary- 
aged children, their study is limited by several factors. First, 
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a number of studies have been done which were not included in 
their review (i.e., they cite 12 different studies with 
elementary-aged children published before 1981; the current 
analysis reports 19 before 1981 and 24 overall). Secondly, 
although Rangert-Drowns et al . apparently coded indicators of 

methodological quality for each study, they did not report 
analyses to determine if there were differential effects for 
studies of high versus low quality. It may be, for example, that 
investigations of lower quality produce effect sizes which are 
substantially different (and also less credible) th^n studies of 
h'gh quality. 

Third, their decision to average all outcomes from a given 
study into one measure of effect size can be misleaa^ng. For 
example, in one of the studies included in their analysis, Levine 
(1980) randomly assigned low SES and higher SES fifth graders to 
either test-taking training or control groups and collected data 
on standardized reading achievement and an assessment of "test- 
wiseness." At least four effect sizes are possible: low SES 
experimental \^rsus higher SES control for reading and tesc- 
wiseness; and higher SES experimental versus higher SES control 
for reading and test-wi seness. These four effect sizes ranged 
from .38 to 1.52 of a standard deviation and averaged .90. To 
report only the average of two )r more of these effect sizes is 
not only misleading, but irretrievably obscures important 
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differences between types of subjects and types of outcome (e.g., 
in Levine's study, the effects for low SES subjects were much 
larger than higher SES subjects for both outcomes; and effects for 
test-wiseness were much larger than reading achievement for both 
groups) . 

Finally, in at least one instance, . ngert-Drowns 3t al. 
appear to have miscalculated the effect size. For example, in the 
Romberg (1978) study, classrooms were randomly assigned to 
treatments, and class averages were used aS the unit of analysis. 
While the use of classroom means as the unit of analysis is an 
appropriate statistical procedure (Peckham, Glass, I Hopkins, 
1969), the standard deviation of group means will generally be 
much smaller than the wi I'hi n-group standard deviation. The use of 
the between-class standard deviation will result in a much larger 
effect size and will not be comparable to most other studies in 
which the wi thin-group standard deviation was u«ed. In the 
Romberg study, Bangert-Drowns et al . apparently used the between- 
class standard deviation for achievement test scores and obtained 
an effect size of .48 of a standard deviation. By contrast, when 
the reported percentile scores are converted to Z scores and 
differences in Z scores were used to estimate the effect size 
(since wi thin-group standard deviations were not reported), an 
effect size based on the within-yroup standard deviation of only 
.14 is obtained— less than one third the magnitude of Bangert- 
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Drowns et al. estimate. For most studies, Banyert-Drowns et al. 
estimates of effect size were reasonably closr to those obtained 
in the current analysis. The few discrepancies which did emerge 
emphasize the complexity and potential pitfalls of effect size 
estimation required for meta-analysis. 

There are other important reasons for replicating and 
extending the work done by Bangert-Drowns et al. (1983). First, 
some investigators believe that the training of test-taking skills 
IS particularly beneficial for children in low socioeconomic 
settings (e.g., Jones & Ligon, 1981; Jongsma & Warshauer, 1975). 
Unfortunately, Bangert-Drowns et al . did not address this issue. 
Secondly, it is important to determine whether the effects of 
training in test-taking skills are different for children of 
different ages. In the Bangert-Drowns et al. analyses, students 
in grades 1 to 6 were combined into only one category. Third, it 
is important to replicate their findings abojjt length of training 
and type of training, and to determine whether there are any other 
important concomitant variables or interactions among variables 
not identified by Bangert-Drowns et al. Finally, it is important 
to know whether studies of good methodological quality produce 
different effect sizes than studies of poorer quality, and whether 
there is a differential *jct for different types of dependent 
measures (e.g., achievement tests, measures of test-wi seness, 
student attitude). 
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Proce l ure 

lo cation of studies . Several procedures were us£d to find as 
many . dies as possible which investigated the effect of teaching 
l^-'-'tuKing skills for standardized group-administered achievement 
tests to elementary-aged school children. Studies were located by 
first conducting a computer-assisted search of Dissertation 
Abstracts International , Psychological Abstracts , and Educational 
Resources Information Center (ERIC) data bases. Studies found in 
this way were examined to determine whether they contained 
references to other appropriate studies. Previous reviews of 
research on teaching test-taking skills (Bangert-Drowns et al., 
1983; Ford, 1973; Fueyo, 1977; Jones & Ligon, 1981; Sarnacki, 
1979; Taylor, 1981) were also examined for additional studies. 
Twenty-four experimental studies of the effects of teaching test- 
taking skills on achievement tests for students in grades 1 
through 6 were located. This number is 100% more than the 
greatest number of studies involving test-taking skills training 
for elementary school children found by any previous reviewer. 

Coding . Each study was coded for 14 different variables 
which described the type of subjects with whom the research was 
conducted, the type of training provided, the experimental design 
used, and the type of outcome data collected. The specific 
variables coded are reported in Table 2 in the results section. 
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Two independent reviewers coded each article. Wherever 
disagreement occurred, differences were resolved by discussion. 

An effect size for each relevant comparison in each study was 
computed (Glass, McGaw, & Smith, 1981) • Effect size was defined 
as the mean difference between two groups divided by the standard 
deviation of the control group. When means and standard 
deviations were not reported in a study, effect sizes could 
sometimes be calculated from other statistics such as _t and F. 
Procedures for determining which effect sizes to code and methods 
of calculation when means and standard deviations were not 
available are given in Casto, White, and Taylor (1983). 

Detailed operational definitions for each coding variable are 
not presented here because of space limitations. Definitions for 
most of the variables are self-evident and were based on the 
author(s)' description in the original study (e.g., grade level, 
type of outcome, students' ojility level). However, the coding of 
methodological adequacy for each study requires some additional 
explanation. Each study was coded as to whether it was a true or 
quasi-experimental design. Then each of the threats to internal 
validity outlined by Campbell and Stanley (1983) was coded along 
a continuum ranging from "0" (this "threat" Is not a plausible 
alternative explanation for the observed effect) to "3" (this 
"threat" Is a plausible alternative explanation which, by itself, 
could account for most or all of the observed effect) point scale. 
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A point system (described in detail by Casto, White, and Taylor 
[1983]) was then used to combine information from the type of 
design and the presence and severity of threats to internal 
validity to categorize studies as to whether they exhibited 
adequate or inadequate methodological quality with respect to 
internal validity. This system has been used successfully in 
several other meta-analyses (e.g., Hubbell et al., 1985; White & 
Casto, in press; White, Myette, & Baer, 1982) and yields 
interrater agreement in the 85-95% range (Casto, White, & Taylor, 
1983). 

Obtained effect sizes were adjusted using Hedges' (1981) 
formula for bias correction of the effect size estimator before 
analyses were done. Although the correction procedure was used 
for all results in the present study, the authors agree with 
Bangert-Drowns et al. that the differences resulting from this 
correction procedure were trivial (only 1 out of 65 effect sizes 
changed by more than .01 of an effect size). 

Results and Discussion 

The 24 investigations of the effect of teaching test-taking 
skills resulted in 65 effect sizes which were relatively evenly 
distributed among studies. The mean effect size for all 
comparisons including achievement tests, tests of test-wiseness, 
self-esteem, and anxiety, was .21 of a standard deviation2, a 
figure which is similar to the ef.ect found by Bangert-Drowns et 
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al. However, we hasten to add tnat this finding has relatively 
little meaning because it is the average across different typec of 
dependent measures, studies of differing quality, and students 
with different characteristics. The most meaningful findings 
begin to emerge as the data are subdivided into categories which 
are relatively more homogeneous f.id focused on specific issues. 

As the first step in analyzing the data, the mean effect was 
calculated for all levels of each variable coded in the meta- 
analysis. As can be seen in Table 2, the average effect size for 
studies with adequate validity is relatively close to that of 
studies with inadequate validity (.20 vs. .29; t=0.67, p > .50)^. 



Insert Table 2 about here 



Although one might infer from this that it is not necessary to 
account for quality of study in interpreting the results of 
this body of literature, further examination of Table 2 shows that 
this is not the case. For example, the mean effect size for all 
achievement test scores is only .14. When data are subdivided 
according to the study's methodological quality, the average 
effect size from studies of adequate validity is only .10 compared 
to an average of .29 for achievement test scores for studies with 
inadequate validity (t=1.64, p= .11). Clearly, the results from 
studies which are better done are better estimates of the effect 
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of training in test-taking skills. These findings contrast with 
the findings of Bangert-Drowns et al. who reported an average 
effect size for achievement tests of .25. For example, for a 
student scoring at the 30th percentile, an increase of .25 
standard deviation would result in a score at the 39th percentile, 
while an increase of .10 standard deviation would result in a 
score at only the 33rd percentile. 

It is also important to note that the average of 44 effect 
sizes for achievement test scores from studies of adequate 
validity is .10, while the average of 5 effect sizes from adequate 
studies measuring "test-wiseness" is .71--almost 10 times as 
large (t=3.62, p < .001). There are also no measures of test- 
wiseness or measures such as anxiety, self-esteem, and attitude 
towards the test, which come from studies with inadequate 
validity. Thus, the apparent equivalence in average effect sizes 
between studies of adequate validity and inadequate validity is 
largely attributable to the fact that outcomes other than 
achievement test scores tend to be substantially higher, and all 
come from studies of adequate validity. These findings suggest 
that outcomes from achievement test scores should be analyzed 
separately from other types of outcomes and that the quality of 
study should be examined in all analyses. 
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The mean effect sizes for achievement test scores from 
studies with adequate validity for different levels of length of 
treatment, SES level, and grade level are shown in Tabl^ 3. 



Insert Table 3 about here 



As can be seen, there was considerable difference between 
interventions which were less than 4 hours and those which were 4 
or more hours (.04 vs. .29; t=2.26, p < .05). A similar finding 
was seen when results of achievement test scores were broken down 
by grade level. When treatments were administered to students in 
the primary grades (1-3), the average effect size on standardized 
achievement tests was only ,01. From grades 4-6, however, the 
mean effect size for achievement tests was higher, .20 (t=1.97, 
p < .06). The difference between students of differing 
socioeconomic background was very slight (.14 vs. .09; t=,45, p > 
.50), with a very small, but probably inconsequential, advantage 
for students from low socioeconomic backgrounds. 

Even more interesting than the average effect size for 
different levels of these three variables are the interactions 
between the variables. As can be seen in Figure 1, for treatments 
involving less than 4 hours, students in the primary grades 
exhibited slightly negative effect sizes (fT - -.12), while 
students from grades 4 through 5 had an average effect size of 
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.19. For students receiving more than 4 hours of training, 
however, there is no difference— students in both grades 1-3 and 
4-6 had an average effect size of .29. Although these findings 
are based on a relatively small number of studies, the data are 
provocative and invite further investigation. More specifically, 

it appears that for older students a short amount of training in 
test-taking skills may result in substantial improvement. 
However, for younger children it takes much more training before 
there are observable benefits. For all students, there is a 
tendency for more training to be associated with larger effects-- 
a finding consistent with Bangert-Drowns et al . 

Figure 2 shows another interesting interaction between length 
of training and socioeconomic status. With less than 4 hours of 
treatment, neither "low SES" nor "not low SES" subjects benefited 
appreciably (average effect sizes are -.05 and .08; t=.74, p > 
.50). With higher levels of treatment, the best pstimate at thT* 
time is that students from low socioeconomic backgrounds benefit 
more than twice as much as students who are not from low 
socioeconomic backgrounds (average effect size = .44 vs. .20; 
t=1.20, p < .20). Again, this finding is based on a small number 
of studies and consequently requires further replication before 
confident conclusions can be drawn. Nonetheless, these findings 
provide some support for those who have contended that training in 
test-taking skills is most important for students from low 
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socioeconomic backgrounds (e.g., Jones & Ligon, 1981; Jongsma & 
Warshauer, 1975). 

More specific information about what types of training 
programs are most effective could not be determined given the 
available data. Unfortunately, most authors described their 
programs in only the most general terms, and consequently it was 
impossible to determine whether certain program characteristics 
were associated with larger effects. Those programs which 
explicitly stated that they used practice tests or some type of 
reinforcement procedures to motivate students to try harder on the 
test did not have larger effects than those which did not refer to 
these components. Unfortunately, we cannot be certain if those 
who did not refer to practice tests or reinforcement procedures 
did not include them as a part of the training or simply failed to 
give a complete enough description of the training program. 

It is also important to comment briefly on the differences 
in average effect sizes between outcomes of achievement test 
scores = .10), tests of "test-wiseness" (judging from the 
brief descriptions of these tests that were given, they would be 
more appropriately referred to as tests of test-taking skills) 
(FI = .71), and measures of anxiety, self-esteem, and attitude 
towards tests (FS = .44) (note: all of these estimates came 
from studies with adequate validity). Admittedly, the measures 
other than scores on achievement tests are based on a very limited 
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number of studies, so one should be cautious in drawing 
conclusions. However, from these data it appears that training 
does have a substantial effect on measures of test-taking skills. 
Thus, it appears that those programs which measured the degree to 
which student^> 'earned the skills they were taught, were generally 
quite successful in teaching those skills. However, high scores 
on tests of test-taking skills were not necessarily associated 
with higher achievement test scores, which suggests that the 
relation between test-taking skills and high scores on achievement 
tests may not be very strong. It should be remembered that the 
primary argument for providing training in test-taking skills to 
students has always been related to the need to reduce measurement 
errors in the child's standardized test score. To the degree that 
that is happening, it has been assumed thac test scores would go 
up. Although the fact that test scores are not going up 
appreciably is not proof that scores are not more accurate, it 
still leaves the burden of proof upon those who claim that 
training in test-taking skills is beneficial. Higher scores on 
tests of test-taking skills demonstrate that intervenors have 
taught what they Intended to teach, but are not sufficient 
evidence for the benefits of such training. 

Conclusions 

The conclusions of this review are somewhat different froni 
the conclusions of all previous reviewers of the effects of 
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training in test-taking skills. Considering elementary school 
children as a total group, the results suggest that training 
children in test-taking skills has very limited impact on 
achievement test scores. The average effect size from the 19 
studies with adequate internal validity is only .10 of a standard 
deviation. An effect size of that magnitude would raise a child's 
score on a standardized achievement test from the 10th to the 12th 
percentile, or the 50th to the 54th percentile. Such differences 
are not sizable and should raise questions about the generally 
proclaimed benefits for such training programs. 

However, as with most educational interventions, the complete 
answer is not as simple as implied by the preceding paragraph. 
Data from the meta-analysis suggest that training in test-taking 
skills is differentially effective for various subgroups of 
children. The interactions between length of treatment and grade 
level, and length of treatment and SES are particularly 
provocative (even though they should be viewed tentatively since 
they are based on only a few studies) and deserve further 
research. In general, the meta-analysis supports the conclusion 
of Bangert-Drowns et al. that longer training programs are more 
effective. In addition, it appears that training is more 
effective in the upper elementary grades than in the lower 
elementary grades. Whether or not a training package includes 



ERiC 



12x 



Teaching Test-Taking Skills 

21 

practice tests or reinforcement strategies is not an issue about 
which there is currently sufficient data to draw conclusions. 

Should training in test-taking skills be provided instead of 
spending the same amount of time teaching reading or math? The 
answer is not clear-cut. Clearly, benefits of a tenth of a 
standard deviation are relatively small (less than one month's 
worth of gain in reading for an average third grader), but they 
were obtained at relatively little cost. Even the longest 
training program lasted only 20 hours, and the majority of effect 
sizes came from studies in which training lasted less than 4 
hours. The question also depends on whether one is talking about 
children in grades 1-3 or grades 4-6. Although the data are drawn 
from a relatively small number of studies, there is some 
suggestion that for older children a limited amount of training 
can have a discernible effect. For younger children, more 
training is necessary. Also, the fact that a few studies 
(unfortunately, it is a very limited number) suggest that training 
in test-taking skills has some positive impact on anxiety, self- 
esteem, and attitude towards tests should not be forgotten. 
However, before the benefits of training in test-taking skills are 
generally accepted, more research needs to be done. It is clear 
that a comprehensive analysis of previous research on training 
test-taking skills suggests that the benefits are not nearly so 
great as has typically been concluded. 
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Should training in test-taking skills be pursued? Hopefully, 
the results of this analysis will temper some of the unfounded 
enthusiasm in support of training children in test-taking skills. 
However, it would be unwise to conclude that training in test- 
taking skills is unwarranted or detrimental. Although the effects 
of such training are small, the investment is also relatively 
small, and there is tentative .evidence that for particular groups 
of children, training in test-taking skills can have substantial 
effects. Those tentative conclusions need further research, but 
indicate an area worth pursuing. 
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Footnotes 

Although the terms are often used interchangably , an 
important distinction can and should be made between "test- 
wiseness" and "test-taking skills." As described originally by 
Millman, Bishop, and Ebel (1965), "test-wi seness" often refers 

to the use of strategies which enable test takers to score higher 
than would be expected based on their knowledge of the content 
being tested (e.g., when in dcubt: look for noun/verb agreements 
between stem and distractor, avoid distractors that use the words 
"never" or always," select the longest, most technical sounding 
distractor). Alternatively, "test-taking skills" refers to skills 
which enable students to more fully demonstrate knowledge they 
have (e.g., being familiar with the question/answer format, 
knowing the meaning of vocabulary used in directions, being able 
to pace oneself appropriately during timed tests). Most training 
programs for elementary school children which are described using 
terms such as test-wiseness, test-taking, or coaching, focus 
primarily on "test-taking skills" as opposed to "test-wiseness." 
Throughout the remainder of this article, the term "test-taking 
skills" will be used as a generic reference which includes the 
other frequently used terms, unless otherwise specified. 

^Throughout the rest of this article, effect sizes are always 
presented in standard deviation units. 
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^There is an ongoing debate about whether inferential 
statistical methods should be used with meta-analysis data. 
Glass, McGaw, and Smith (1981) suggest that such techniques add 
little useful information since the data are more properly 
regarded as a population rather than as a sample. Furthermore, 

since the data points are probably not indepenJc^t ^whether or not 
more than one effect size is calculated per study), the 
assumptions of inferential techniques are not met, and it is 
unclear what effect this will have on calculations. Glass et al. 
(1981) suggest that, in most instances, the use of inferential 
techniques with non-independent data will result in more Type I 
errors, but also cite examples where exactly the opposite happens. 
Hedges {1982a, 1982b) proposes a series of "goodness of fit" tests 
for use with meta-analyses data. However, these tests suffer from 
inadequate statistical power and are inappropriate when the 
variables being analyzed are non-orthogonal (as will almost always 
be the case). Given these concerns, we have calculated^ tests 
for most comparisons with the understanding that the p values may 
not be accurate. The best guideline for judging differences is 
probably a logical analysis which includes consideration of the 
absolute magnitude (differences of .25 of a standard deviation are 
generally considered moderately large), the variability of the 
data, and the number of effect sizes and studies on which an 
estimate is based. 
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Cll«r<cf rUtict «nd Conclusions of Previous Reviewers of the Effect of Teaching Test-Taking Skills 



Author/year 


# of experi- 
mental 

siuQ 1 es 
cited 


Methods for 
selecting 
studies 
specified? 


Previous 
reviewei^s 
cited and 
critiqued 


Outcomes of 
experimental 
studies cited 

in terms of 


Conclusions 
about effec- 
tiveness of 
training test- 
talcing sicills 


Variables 
ci tec which 
covary with 

effect of 
training 


Type of 
studies 
included 


Banger t*Drowns 


30 


Yes 


No 


Sta'^dardized 
effect size 


Effective 
ES « .25 


Length of train- 
ing program, 
type of training 


Achievement 
tests; elemen- 
tary and secon- 
dary level 


Ford/1973 


24 


No 


No 


Conclusions 


Effective 


None 


Achievement, IQ, 
and aptitude 
tests; preschool 
through adult 


FMyQ/1977 


19 


No 


• No 


Conclusions 


Effective 


None 


Achievement, IQ, 
and apti tude 
tests; preschool 
through ""ult 


Jones I 1.1900/ 
1961 


5 


No 


No 


Conclusions 


Effective 


Maintenance of 

effect 
Socioeconomic 

status 


Achievement. IQ, 
and aptitude 
tests; pre :hool 
throuQh adult 


S«rnKki/1979 


17 


No 


No 


Conclusions 


Effective 


None 


Achievement, IQ, 
and apti tude 
tests; preschool 
through adult 


T«ylor/1981 


34 


Yes 


Yes 


Standardized 
effect size 


Effective 
n « .62 


Type of trainiiig, 
uni t of admini s- 
tration, quality 
of study, type of 
test (achievement 
vs. IQ) 


Achievement, IQ, 
and aptitude 
tests; preschool 
through adult 
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Table 2 

Mean Effect Size for All Levels of All CodeJ Variables 





Adequate validity 
U S0^5 N^5 


Inadequate validity 
K SD^5 N^3 


All Studies 


.20 .40 55 


.29 .33 10 


Total sample size Small (0-75) 
for study: Medium (76-150) 
Large (150^) 


.32 .28 21 
.11 .50 24 
.15 .30 10 


.40 .46 5 
.18 .08 5 


Grade level: Ist-3rd 
4th-6th 


.03 .SI 25 
.33 .39 30 


.14 «06 6 
.59 .54 3 


Socioeconomic Low 
status level: Not low 


.18 .3/ 37 
.24 .46 18 


.33 .36 8 
.11 .02 2 


Use of reinforcement No 
procedures as part Yes 
of training: 


.22 .40 48 
-.00 .43 7 


.29 .33 10 


Hours of training: Less than 1 hr 
1 to 3 hrs 
4 hrs+ 


.09 .43 14 
.09 .30 22 
.40 .42 19 


.37 .47 5 
.20 .13 4 


Use of practice No 
tests as part of Yes 
training: 


.22 .43 42 
.12 .30 13 


.40 .46 5 
.16 .07 4 


Ability level of Mixed 
students: High ability 
Low ability 


.20 .52 47 
.09 .21 3 
.31 .12 5 


.29 .33 10 


Type of assignment Random 
to groups: Good matching 
Poor matching 


.27 .39 40 
.24 .01 2 
-.05 .37 13 


.30 .40 7 
.28 .10 3 


Blinding of data Yes 
col lector: No 


.13 .44 34 
.31 .30 21 


.16 .07 4 
.38 .42 6 


Type of outcome measure: 

Achievement test 

Test-wlseness test 

Other (anxiety, self-esteem, 

attitude) 


10 . 33 44 
.71 ,57 5 

.44 .36 6 


.29 .33 10 



• mean effect size for a particular group. 

SOes ' standard deviation of effect size distribution for a 
particular group. 

N^s > number of effect sizes on which a computation Is based. 

Note. Several other variables Including Percent Hale, Percent Handicapped, 
and Percent Minority were coded to examine whether mean effect size covarled 
with such subject characteristics. Results for those vaiiables are not 
reported here because of infrequent reporting (e.g.. Percent Handicapped 
could only be coded for 2t of the ES's), or lack of variance (e.g., 97X of 
the ES*s for Percent Hale fell between Alt and 54%). Complete data for each 
case included in the meta-analysis Is available from the authors. 
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Table 3 

Mean Effect Sizes for Achievement Test Scores from Studies of 
Adequate Validity, Broken Down by Treatment Length, SES Level, and 
Grade Level 





Mean ES 


^^ES 


"es 


Studi es 


'.ess than 4 hours of 
treatment 


.04 


.30 


18 


7 


4 or more hours of 
treatment 


.29 


.31 


13 


8 


Low SES 


.14 


.38 


13 


10 


Not low SES 


.09 


.31 


31 


13 


Grades 1-3 


.01 


.37 


22 


9 


Grades 4-6 


.20 


.26 


22 


9 
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Figure Captions 

Figure 1 > Mean effect size by treatment length and grade level 
for achievement test scores from studies with adequate validity. 
Figure 2 * Mean effect size by treatme.it length and SES for 
achievement test scores from studies with adequate validity. 
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Do separate answer sheets inhibit the 
performance of learning disabled students? 

Yes, according to a recent study performed at Utah State University, 
LD and nondisabied students were given three subtests of the Compre- 
hensive Tests of Basic Skiiis (CTBS) for which correct answers were 
identified in the test boolc. Students were instructed to record the 
correct answers on the separate answer sheet as quickiy and efficientiy 
as possible. Learning disabled students' performance was found to be 
slower, less accurate, and less neat than their nonhandicapped peers. 
Figure A shows differences between LD and regular classroom students 
with respect to accurncy and fluency on completion of the separate 
answer sheet This discrepancy could contribute to measurement error 
in the LD population. However, it would also seem that LD students 
Improved appreciably in use of separate answer sheets with practice. 
Figure B shows increase in fluency and accuracy of LD students after 
only three practice sessions with teacher feedbacic. 



Are learning disabled students deficient 
in test-taking skills? If so, do learning 
disabled students benefit from training? 

Yes, learning disabled students are deficient in test-tal(ing sidlis 
Scruggs (1984, 1985) found LD students differed from their nonhandi- 
capped peers with respect to use of appropriate strategies on 
standardized achievement tests. These strategy deficits Included use of 
prior {knowledge, use of deductive reasoning sidlls, attention to appro- 
priate distractors, and selection of strategies appropriate to correctly 
answering different types of Items. 

Recently, LD students have been trained in using appropriate test- 
taidng strategies. Results Indicated that test scores of trained students 
improved as much as 8-10 percentile points on reading achievement 
tests over untrained control students (Scruggs & IMastropieri, in press). 
In addition, a separate investigation revealed that students' attitude 
toward tests qualitatively improved as a ri3ult of training 



Should guessing and answer changing be 
encouraged? 

Usually students are advised not to guess on standarized multiple 
choice tests. However, according to Hammerton (1965) and Bauer (1973), 
testwise students tend to guess more often than their naive counter- 
parts, and as a result, obtain higher scores. Thus, an appropriate 
guessing strategy should be employed. 

Ebel (1965) concludes from his study with true/false tests that 
"students seeking highest scores on a test are well advised to answer 
ail questions even when the usual correction is applied (their blind 
guesses to true/false tend to be correct more than half of the time)." 

The problem to solve now becomes "How does a test-taker decide 
which answer is the best guess?" Numerous testwiseness suggestions 
are provided by Millman's (1969) and Smith's (1982) guidelines. 

Beck (1978) studied the effect of changing Item responses on scores 
of elementary school children on a standardized achievement test. 
Results clearly indicated that response changes on multiple-choice 
items tend to improve test scores. 

In spite of conventional wisdom regarding guessing and answer 
changing, research evidence indicates that: 

— Students should answer all questions, even when guessing is 

penalized. 

— Students should be encouraged to change any answer they have 

had second thoughts about. 



What should LD students be taught about 
test taking? 

Our recent research indicates that LD students benefit most from ex- 
tended, guided practice and general familiarity with test conventions 
and formats. To this end, LD students should be given relevant practice 
with questions and formats similar to those which they will see on 
achievement tests. (Students, of course, should not be given the exact 
Items they will be tested on.) 

in addition, the following strategies have been successfully taught 
to LD students and have boen effective in improving test scores: 

1. Never skip an answer. 

2. Be certain to attend to all distractors and refer to the reading 
passage, even If you are "very sure" your answer is correct. 

3. If you are having great difficulty reading a passage, read the ques- 
tions and try to answer them anyway. If you have difficulty with 
some words in the questions, or distractors, answer anyway and base 
your answers on the words you can read. 

4. If you have attended to all parts of a passage and test question and 
still do not know an answer, there is still a good chance of getting 
the correct answer if you guess. 

5. Be certain you are attending to the appropriate stimulus, such as the 
underlined sound in a "word study skills" subtest. As in other sub- 
tests, wrong answer choices are given which may look correct at 
first glance. 

6. Make sure you answer every item, even if you must hurry and guess 
a lot near the end. You will probably get some of the answers correct. 

Examples and practice activities will help develop these test-taking 

skills. 
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How are your test-taking skills? 



1. The short story, "The Four Seasons," is about: 

a. vegetation in North America 
^ b. wind current and their effects 

c. the changing weather 

d. the growth process 

2. The greatest advantage of using slent in the manufacture of steel is 
i that sient mal(es steel 

8. transparent 

b. stainless 

c. heavy 
^ d. bulky 

3. The Japanese game of paduki 

a. can only be played by the imperial Family 

b. is sometimes played Indoors 

^ c. can never be played for more than 30 minutes 

d. is always played at every celebration 

4. When Bestor crystals are added to water 
a. heat is given off 

^ b. the temperature of the solution rises 

c. the solution turns blue 

d. the container becomes warmer 

^ The reasoning strategies are explained, followed by the correct answer 

1. The convergence strategy (stem), recently described by Smith (1982X 
involves teaching teot-takers to examine all choices presented after 
the stem of a multiple-choice question in order to analyze the relation- 

• ships of the distractors to each other and, therebv, identify the choice 

most likely to be correct. (1 . c). 

2. Absurd options can be eliminated as incorrect choices, and thus, 

I increase the probability of choosing the correct answer. (GIbb, 1 964). 

(2.b). 

3. Specific determiners (e.g., always, never, all), are words which 
provide cues to the likely correctness of choices, especially on true/ 

» false items. (Slakter, 1 970). (3. b). 

4. Identifying similar (but slightly different) options again narrows 
down the possibility of choosing incorrect answers. (Mlllman, 

I o 1969), (<.c). 
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Abstract 

Fifty-eight third graders from two elementary school classrooms 
were as<^igned at random to test-training and placebo groups. 
Students in the test-training group received six sessions of test- 
wiseness training specifically tailored to the Comprehensive 
Test of Basic Skills. Students in the placebo group received six 
sessions of creative writing exercises. The effectiveness of this 
training on achievement test scores was obscured due to the 
presence of ceiling effects. Supplementary analyses, however, 
provided some limited support for the effectiveness of this 
training. Trained and untrained groups were noc seen to differ on 
measures cf on-task behavior during the testing situation. An 
analysis of reported attitudes toward tests taken immediately 
after the three-day testing period suggested that (a) the 
standardized test experience was a stressful one for cont»^ol 
subjects, and (b) that the test-wi seness training had exerted a 
significant ameliorating effect cn attitudes in the treatment 
group. Results suggested that test-wi seness training may reduce 
levels of anxiety in elementary school children during test 
situations. 
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The Effects of Training in Test-Taking Skills on 
Test Performance, Attitudes, and On-Task 
Behavior of Elementary School Children 

In recent years, the effectiveness of coaching on 
achievement test performance has been well studied (see Sarnacki, 
1979, and Fueyo, 1976, for reviews). In a recent meta-analysis, 
Bangert-Drowns, Kulik, and Kulik (1983) determined that coaching 
for achievement tests in the elementary grades produced a 
generally facilitative effect (average effect size = .29) over all 
studies reviewed. More recently, Scruggs, Bennion, and White 
(1984) have argued that although training in test-taking skills 
does often produce an effect in the elementary school grades, this 
effect is dependent upon other factors, for example, length of 
training, age of students, and economic level of the students 
trained. Although researchers in the area of test-wiseness 
training have often examined variables in addition to actual test 
scores such as performance on test-wiseness tests and self-esteem, 
they have not addressed the issue of whether or not such training 
changes in any way the attitudes of elementary school children 
toward tests. This in itself tould be an important finding for, 
concerning the degree to which school-age children are subjected 
to testing procedures, it would be helpful to ensure that such 
tests were not unnecessarily stressful. In addition, whether or 
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not training in test-taking skills has a facilitative influence on 
the level of effort the students put into the test situation 
remains unclear. Such effort may be evaluated by means of the 
amount of time on-task students exhibit auring standardized 
testing. 

The present investigation was intendeJ to address some of 
these issues by providing training in test-taking skills to a 
sample of third grade students and assessing, in addition to test 
performance, reported attitudes towards the test- taking 
experience and percent of time actually spent on-task during test 
administration. Although the effects of test-wiseness training 
have been well-documented in the past, the present investigation 
was intended to shed some light on peripheral issues and to 
address more specifically exactly what changes in attention and 
attitude occur as a result of coaching on echievement tests. 

Method 

Subjects 

Subjects were 58 elementary-age school children attending the 
tnird grade in two different classrooms at a western rural school 
district. Sex was evenly distributed. Subjects were selected at 
random from both classes to participate in treatment and placebo 
groups. 
Materials 

Materials included a manual with six scripted 20- to 30- 
minute lessons in test-taking skills specifically tailored to the 
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reading subtests of the Comprehensive Test of Basic Skills (CTBS), 
Level E. These materials were developed specifically for this 
project and included student workbooks for practice activities by 
the students (Williams, 1984). 
Procedure 

Over a two-week period, treatment students were administered 
six lessons in test-taking skills appropriate to the reading 
subtest of the CTBS, by a trained, outside experimenter. These 
lessons included, for example, time-using strategies, deductive 
reading strategies, error avoidance strategies, and specific 
practice activities in each of the subtests. To control for 
possible Hawthorne effects, the placebo group was ^iven six 
exercises in creative writing by an outside experimenter at the 
same time treatment students were receiving test training. 
Within three days after the conclusion of training, students were 
given the CTBS by their regular classroom teachers in their 
regular instructional classes. During the taking of this test, 
observational measures were taken of on-task behavior of students 
by four trained observers unaware of group memberships of the 
students being observed. The observers employed a time-sampling 
procedure on an interval of 30 seconds. Each student 
observed was observed for 30 minutes. On-task behavior was 
computed as percentage of times sampled on-task during actual test 
performance and on-task behavior while directions were being 



15b 



Effects of Training 
6 

given. On-task behavior during directions was defined as 
orientation of student's eyes toward either teacher or test 
booklet and penci 1-and-paper compliance with accompanying sample 
activities. On-^^sk during testing was defined as student's eyes 
directed toward lesi jooklet, pencil in hand, acti vi ty marking, 

readi .g, or asking teacher direct questions with specific 
reference to the test. After completion of the third and final 
day of testing, students were given an attitude toward tests 
questionnaire (see Figure 1). This questionnaire consisted of 10 



Insert Figure 1 about here 



items in an agree/disagree format. Students completed the 
questionnaire together while the teacher read items to the class. 

Results 

Achi evement 

Mean scores on the reading subtest of the CTBS were computed 
and compared statistically by means of t tests. As can be seen in 
Table 1, none of the group differences are statistically 
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significant. Interpretation is not possible, however, due to the 
presence of overwhelming ceiling effects exhibited on all 
subtests. 
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A supplementary analysis was conducted on the lower half of 
each group chosen by the previous year's total reading scores and 
is given in Table 2. This analysis indicates that standardized 
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gain scores between second aid third grade testing were 
significantly higher intfavor of the treatment group on Word 
Attack Subtest and Total Reading Score. 
On-Task Behavior 

Mean on-task behavior during directions, during testing, and 
total is given in Table 1. As can be seen, no significant group 
differences were found. 
Attitudes Toward Tests 

Reliability of the attitude measure was computed by means of 
a Kuder-Ri chardson 20 fonnula and was given at .88, indicating a 
moderately strong degree of internal consistency for a measure of 
this type. Differences between the mean scores of the two groups 
were nonsignificant, ;t less than 1 in absolute velue. An 
inspection of Figure 2, however, shows that the distribution of 
these two groups differs strongly. These differences are most 
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obvious when one employs a curve-smoothing technique of combining 
the mean scores for each of two adjacent frequencies and are given 
in the same figure. The difference between these dispersions was 
tested statistically in two ways: mean differences from the mean 
in standard scores were computed for subjects in each group and 
compared statistically. The mean distance from the mean of the 
placebo group was statistically greater than the average distance 
from the mean in the training group {p < .01). In addition, a 
Kolmogorov-Smi rnov two-sample test (Siegel, 1956) was applied to 
each half of the distribution. For the lower half of each 
distribution (that is, students scoring 0 through 5 on the 
measure), the distributions were statistically different (Z^ = 
1.529, p < .02), while the upper half of each distribution was not 
seen to differ significantly (2 = .756, p = .617). 

Discussion 

The present investigation does not offer conclusive evidence 
that the particular training package employed significantly 
improved test scores, due to the ceiling effects reported in the 
Results section. However, it was found that students in the lower 
half of the treatment group exhibited statistically higher gain 
scores over the previous year*s testing than did the lower half of 
the placebo group. Particularly, this type of training has 
previously been seen to demonstrate a significant effect on a 
subtest similar to the Word Attack subtest in a sample of learning 
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disabled and behaviorally disordered children (Scruggs & 
Mastropieri , in press.) 

That achievement test coaching results in greater levels of 
on-task behavior on the part of students was not supported by the 
present investigation. Student on-task behaviors while listening 
to directions and while taking the test itself were very similar. 

Analysis of the attitude data did suggest that students in 
the treatment group reported more "normal" attitudes than those in 
the placebo group. The abnormal distribution of scores in the 
placebo group is highly reminiscent of that of a population under 
stress (see Wilson, 1973). The fact that the abnormally high 
number of very negative attitudes was not present in the treatment 
condition while the number of strongly positive attitudes was 
relatively similar suggests that this treatment may have 
contributed to more positive attitudes on the part of those 
students who may otherwise have developed strong negative 
reactions to the test and the test-taking situation. It should be 
noted here that completely positive attitudes toward tests was not 
expected and is not necessarily a realistic expectation. What was 
expected was a roughly normal distribution centering around the 
mean of about 5, which is in fact the distribution seen in the 
training group. The large proportion of extreme scores in the 
placebo group (with fully two-thirds of the scores within 1 point 
of 0 or 10) suggests thet the population had been subjected to 
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some stress and had reported widely polarized views on the test- 
taking process. In the training group, these attitijdes seemed to 
have been ameliorated substantially. 
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CTBS Reading Subtests 



Variable 


N 




SO 


T 


2-tail 
prob. 


Word attack 












Tx 


29 


29.79 


4.87 


.05 


.959 


Cx 


29 


29.72 


5.37 






Vocabulary 












Tx 


29 


26.31 


4.58 


-.49 


.624 


Cx 


29 


26.90 


4.47 






Comprehensi on 












Tx 


29 


26.48 


4.06 


.79 


.434 


Cx 


29 


25.51 


5.21 






Total reading 












Tx 


29 


82.59 


12.35 


.13 


.898 


Cx 


29 


82.14 


14.04 
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2-tail 

Variable H ^ SO T prob. 



CTBS total battery 

Tx 29 150.17 24.68 



Cx 29 154.03 24.10 
Attitude toward test-taking 

Tx 29 5.59 2.97 

Cx 27 5.04 3.95 

On-task during jiiections 

Tx 18 45.28 15.78 

Cx 18 50.06 21.89 

On-task during testing 

Tx 18 77.67 16.18 

Cx 18 77.28 " 14.98 
Total on-task 

Tx 18 65.78 14.76 

Cx 18 67. 78 11.82 

- It, 



-.60 .549 



.59 .557 



-.75 .458 



.07 .941 



-.45 .656 
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Table 2 

Gain Score Differences Between the Lower Half of'Each Group (Chosen 



by Last Year' 


s Total 


Readi ng) 










VariuDle 


N 


X 


SD 


Error 


T 


Prob. 


Word attack 














Tx 


12 


25.83 


39.55 


11.42 
















2.41 


.012 


Cx 


14 


-20.86 


47. U6 


12.58 






Vocabulary 














Tx 


i2 


18.67 


50.77 


14.66 
















.49 


.625 


Cx 


14 


7.93 


58.69 


15.69 






Comprehension 














Tx 


12 


53.17 


37.96 


10.96 
















1.46 


.158 


Cx 


14 


24.79 


57.54 


15.38 






Total of all 


subtests 












Tx 


12 


97.67 • 


52.64 


15.20 
















2.51 


.019 


Cx 


14 


11.86 


107.92 


28.84 
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Figure Captions 
Figure 1 , Attitude measure. 
Figure Z . Distribi ion of attitude scores. 



rcle YES or NO. 



Taking a test is my favorite thing to do 
at school. 

Sometimes I am nervous when I take a 
test. 

I look forward to taking a test. 

I dislike taking a test when I don't know 
the answers . 

I wish we had fewer tests. 
Taking a test is always fun. 

I like tests even when I don't know the 
answers . 

Taking a test is one of the worst things 
about school . 

I would rather do something else besides 
take a test. 

I wish we had more tests. 
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CAN LEARNING DISABLED STUDENTS EFFECTIVELY 
USE SEPARATE ANSWER SHEETS? ^ 



DEBRA TOLFA VEIT 
University of Alaska — Anchorage 



THOMAS E. SCRUGGS 
Utah Statt University 



Summjry, — 100 regular class and learning disabled students were admin- 
istered three subtests of the Comprehensive Test of Basic Skills for which all 
correct answers had been identified in the students' test booklet. Analysis of 
the completed separate answer sheets indicated ihat learning disabled students 
answered fewer total iten^5 than their nondisabled peers but did not differ with 
respect to percent of items answered correctly. In addition, nonsignificant dif- 
ferences were found for number of answer spaces filled in outside the line. 
Implications for further research and training are given. 

la recent years, attention has focused upon the skills and strategies learning 
disabled students apply independently to test-taking situations (Taylor & 
Scruggs, 1983). Any observed deficiencies in these 'test-taking skills' could 
be considered (a) a potential source of measurement error (Ebcl, 1965) as 
well as (b) a potential area for intervention. And, although research has indi- 
cated that group-administered achievement tests can be reliable and valid for 
learning disabled students (Price, 1984), some deficiencies in test-ti^Jng skills 
have been observed in this population. Scruggs and Lifson (1985) admin- 
istered reading-comprehension questions to learning disabled and nondisabled 
students without providing the accompanying reading passages. They fouiid 
that, although nondisabled readen were apparently able to make use of such 
strategies as partial or prior knowledge, error avoidance, elimination, and use 
of information from other test items, learning disabled students were much less 
successful. Drawing upon a previous investigation with mostly nondisabled 
students (Scruggs, Bennion, & Lifson, 198!)a), Scruggs, Bennion, and Lifson 
(1985b) recently interviewed learning disabled and nondisabled students con- 
cerning the "test-taking strategies" they spontaneously employed on reading 
achievement tests. It was concluded that (a) learning disabled smdents were 
less successful in selecting strategies appropriate for different types of test 
questions, and (b) they were less successful at adapting to novel test formats. 
Given the number and frequency of format changes on standardized achieve- 
ment tests, these faaors could exert a potentially strong influence on the stu- 
dents' test performance (Tolfa, Scruggs, & Bennion, 1985). 

^The research repotted here was supported in part by a grant from the Department of 
Education, Office of Special Education Programs, No. GO08300008. The authors would 
like to thank Mrs. Bonnie OiSen for her assistance with this project and Mary Ellen 
Heiner for her assistance in u\t preparation of the manuscript. Address requests for 
reprints to Thomas E Scruggs, Developmenul Center for Handicapped Persons, Utah 
State University, Logan, UT 8432^-6840. 
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Another important format change which takes place on standardized tests 
after the primary grad^is the inclusion of separate answer sheets to facilitate 
machine scoring. The ability to use separate answer sheets appears to be de- 
velopmental in nature, as students in Grades 1 and 2 show better performance 
when test booklets are used than separate answer sheets (Ramscyer & Cashen, 
1971). Cashen and Ramseyer (1969) indicated that the need to use the test^ 
booklet marking decreases as the grade of the student inaeases. TypicJly, 
standardized tests begin the use of separate answer sheets in Grade 4. Th 
implications for learning disabled students in Grades 4 or 5 functioning two 
years behind peers in pcrcepturJ-motor skills become obvious. 

In a recent meta-analysis of Wcchsler Intelligence Scale fo: Children — 
Revised (WISC — R) subtest scores of learning disabled childr<:n, Kavale and 
Forness ( 1984) concluded that such students had as a grou^ icored somewhat 
lower on the WISC — R Coding subtest, which measures copying speed and 
efficiency. The average scaled score equivalent on this test was 8.77, repre- 
senting performance at the 37th percentile. Whether such a performance 
deficiency is sufficient to hamper seriously performance on separate answer 
sheets, however, is unknown. 

It has been suggested that students can be trained to use a separate answer 
sheet (McKec, 1967; Ramseyer & Cashen, 1971). McKee (1967) desaibed 
training third graders to use sq>arate answer sheets successfully. However, 
this study represented a more subjeaive evaluation than a tightly designed re- 
search study. Ramseyer and Cashen (1971) concluded that first and second 
graders were unable to utilize separate answer sheets effeaWely even after 
praaice. Both studies (McKee, 1967; Ramseyer & Cashes., 1971) were con- 
duaed with students in regular classrooms. 

The present investigation examined the use of separate answer sheets with 
learning disabled students in Grade 4. The study was conduced to determine 
if, in fact, learning disabled students in Grad*^ 4 use separate answer sheets less 
efficiently than their normally functioning peers when relative ability to answer 
test items is controlled. Although it is important to know the level of defi- 
ciency exhibited by learning disabled students under actual test conditions, this 
preliminary investigation was conducted to compare directly performance on 
the clerical performance without regard to relative item or subtest difficulty. 

Method 

Subjects 

Subjeas were 101 students enrolled in Grade 4 of an elementary school in a whi*^e, 
middle-class, rural university community in northern Utah. All students were between 
119 and 130 mo. of age and had not been previously promoted or retained relative to 
other peers. Nineteen of these students (14 boys and 3 girls) were classified as learning 
disabled according to P.L. 94-142 and Utah State guidelines, which include average 
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ibility coupled with two years' discrepanqr on standardized achievement tests. Average 
Wechsier Intelligence Scale for Children — Revised (WISC — R) for the learning dis- 
abled group was 94.94 {SD = 8.81); average Total Reading grade equivalent score 
from the Woodcock- Johnson scale was 2.63 {SD = .90). Eighty-two (47 boys and 
35 girls) nondisabled students were functioning within the regular classroom. These 
students were performing at or near grade-level and had not been identified as "gifted,** 
"remedial," or identified for special services of any kind. Average Total Reading grade 
equivalent from the Comprehensive lest of Basic Skills was 4.24 {SD = 1.42). Intel- 
ligence test scores for this group were not available for this report but were presumed 
to be only slightly higher than those of the learning disabled students. 
Matfrials 

Experimental materials consisted of the test booklet appropriate for the fourth grade 
Comprehensive Test of Basic Skills and its answer sheet. All correct responses had been 
marked with a black arrow in the test booklet. Subtests 1. 4, and 7 were selected as 
target subtests. All subtests contained 45 questions. A presenter's script was prepared. 
Proctdmrg 

Nineteen learning disabled students and 82 regular fourth graders were admin- 
istered the three subtests by one of three examiners. Examiners were given a written 
script to ensure all students received the same directions. AH students were administered 
the assignment in a group with the exception of three learning disabled students who 
were administered the exercise individually in the resource room. 

Studentt were told that they would be given a test that already had the corrca 
answers marked and that their task was to mark the correct answer on the separate answer 
sheet They were told to work as quickly and carefully as possible; they would be given 
3 min. to work on each subtest Students and examiners worked the examples together^ 
and examiners checked to ensure students were completing the corrca subtest section^ 
on the answer sheet and that studentt were working as quickly and efficiently as possible. 

Answer sheets were scored by recording number of items completed, number of 
items answered correctly, and number of items marked outtide the established 5-mm 
radius from the center of each answer circle for each subtest This distance represented 
the point at which the pencil mark could intrude into an adjacent answer space. 

Results and Discussion 
Each subtest was evaluated based on total number of items completed, tota* 
percent marked correaly, and total percent marked outside the circle (i.e., 
more than 5 mm from the center). For total completed, students in the non- 
disabled group obtained a mean score of 96.7 (SD = 18.8), while students in 
the learning disabled group obtained a mean score of 86.2 (SD = 18.0) . These 
differences were statistically significant = 2.19, p rr- .03). For percent 
of maiked items answered correaly, however, differences were not observed. 
Students in the nondisabled group recorded an average of 98% (SD = 6%) 
of their answers correctly, while learning disabled students marked an average 
of 96% (SD = 13%) of their answers correctly. Because observed ceiling 
effects violated assumptions of normality and homogeneity, comparisons were 
made by means of the nonparametric Mann- Whitney U test (Ferguson, 1982), 
which yielded a nonsignificant difference (p > 20). 



158 



D. T. VEIT & T. E. SCRUGGS 



In addition, a nonsignificant difference was found when groups were com- 
pared for percent of answer spaces marked outside the line (^21 = 1.71, p = 
.10) according to a separate variance estimate with a 4'/ correction (Ferguson, 
1982). The nondisabled group marked a mean percent of 7.8 (SD = 8.6) 
answers outside the line, while the learning disabled sample marked aa average 
percent of 13.0 (SD = 12.7) answers outside the line, assessed as a function 
of total number of answers marked. 

In the present investigation, learning disabled students differed from non- 
disabled students with respect to ability to utilize a separate answer sheet in 
recording answers to standardized achievement test questions. These differ- 
ences were significant for speed of completion and nonsignificant for accuracy 
and neatness, although differences were also noted in these areas. Present data 
provided preliminary evidence that learning disabled students may differ in 
cenain aspects of clerical efficiency in using separate answer sheets, with rela* 
tive item or subtest difficulty controlled. Funher research is needed, however, 
to document the extent to which performance may be inhibited under standard- 
ized conditions of test administration. 

The present investigation was intended to provide a means for examining 
possible differences between learning disabled and more average learners on 
copying speed and efficiency on achievement- test answer sheets. The findings 
of such a study are subject to several limitations. First, observed differences 
may have been a funaion of motivational differences and not differences in 
perceptual motor funaioning between the two groups. And, although every 
effon was made during task administration to encourage fast, efficient per- 
formance of all students, such an alternative explanation is possible. Funher 
research effons could include the provision of incentives to ensure motivation, 
although such a procedure would further distance the task from standardized 
administration procedures. Second, it could be argued that the present task 
was not directly relevant to that of an actual testing situation. Although the 
controls employed in this task did render it less *reai,' it is nonetheless unlikely 
that a child who had difficulty with the presently employed task would not 
have similar difficulties under administration conditions for standardised tests. 
Finflijy^ the lack of available IQs for oae group prohibited evaluation of a pos- 
sible interaction with general intelligence. Although this issue deserves funher 
study, such related clerical tasks as the Coding subtest of the WISC— R have 
previously indicated only a weak relation to general intelligence (Sattler, 1982) 
and may not have played an imponant role in the present findings. 

Two possible int( rventions can be imagined to correct such possible dif- 
ficulties. One possibility is to modify the tests, while the other possibility is 
to train the students to be more efficient with separate answer sheets. And, in 
fact, such procedures have recently received attention. Beattie, Grise, and 
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Algozzine (1983) assessed die effeaiveness of several test modifications, in- 
cluding imbedding the answer circle within the test booklet, on the competency- 
test performance of learning disabled students. Although some r^^riptive 
advantages were noted, the over-all modifications did not produce any strong 
consistent effea. With respect to the second possibility, attempts to train 
learning disable^, children in use of novel test formats, including separate answer 
sheets, have been successful. Scruggs and Tolfa ( 1985) anci Scruggs and Mas* 
tropieri (in press), reported successfully teaching such 'test-taking skills' to 
learning disabled students, so that test-performance, subsequent to training, 
was significandy higher than that of untrained controls. The faa that effect 
sizes in these investigations were higher than those usually reported in the lit- 
erature (Scruggs, Bennion, Sc White, in press) supports the notion that learning 
disabled students may indeed show relative deficits in a variety of *test' taking 
skills' (Scruggs & Lifson, 1985). Since training on the use of separate answer 
sheets has not beea specifically evaluated, however, conclusions concerning the 
effeaiveness of such training is unknown. Funher research can help to describe 
such deficits and develop effeaive remediation. 
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ATTITUDES OF BEHAVIORALLY DISORDERED STUDENTS 
TOWARD TESTS: A REPUCATION^ 

DEBRA TOLFA, THOMAS E SCRUGGS, AND MARGO A. MASTROPIERI 
Vtsh StM$4 VnivtrsUy 

Sumnury.^S bchtviortlly disordced and 48 more ivcrigc students were 
•dministcrcd » test attitude survey imroediitely after district-wide standardized 
achievement testing. Results were consistent with previous research which sug- 
gested behaviorally disordered students may report poorer attitudes than their 
more typical peers. 

The behaviorally disordered student is classified on the basis of avcraije 
or near average intellectual ability m addition to ^^ial or emotional functioning 
that is substantially different from that of other students the same age. Be- 
haviorally disordered students have repeatedly shown academic deficiencies 
(Mastropieri, Jenkins, & Scruggs, in press; Motto & Wilkins, 1968; Stone & 
Rowley, 1964). Several variables, including attitude toward school studies 
(Silberberg & Silberberg, 1971), impulsivity (Letteri, 1979), and responses 
toward test-taking situations (Forness & Dvorak, 1982; Scruggs & Mastropieri, 
in press), have been identified as possible contributing factors to academic 
deficiencies. 

The present study investigated behaviorally disordered students' attitude 
toward test-taking situations. In the Scruggs, Mastropieri, Tolfa, and Jenkins 
(1985) study, confliaing results were found. In Stud;' 1, responses of fifth 
and sixth grade behaviorall" disordered students were compared with tho^c of 
their normally funaioning peers on a 12-item Sv 'ey of test attitudes. The 
behaviorally disordered students differed significantly from their normally 
functioning peers on the over-all survey as well as the specific factors involving 
subjective feelings about tests and feelings about the personal importance of 
tests Groups did not differ with respect to evaluation of the objective value 
of tests. The sample in this study was relatively small (N = 37) however, 
and the survey contained too few items to draw firm conclusions. 

In Study 2 of the same investigation, 75 students in regular classrooms 
and 25 behaviorally disordered students from self-contained rooms were admin- 
istered a longer survey. Groups, which were equivalent with respect to nuin- 
ber, age, sex, and grade, were then compared. Tlicre was no difference between 
'The research dcst^ibcJ here was supported in part by a fir.int from the Department of 
Education, Special Education Programs, Office of Special Education and RcIiabiJt.itivc 
ScrvKes No. G0O8300O08. The authors thanic Ms Cathy Smith, Coordinator of bpccu 
Education, Hillview ^i.emcntary School, Salt Lake City, Utah for her cooperation as well 
as Ursula Pimentel and Mary Ellen Heiner for their assistance in the preparation ot tins 
manuscript. Address requests for reprints to Thomas E Scruggs, UMC 68, Utah btarc 
University, Logan, UT 84322. 
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groups' scores on the coctl survey, on "personal feeling** items, or on **value of 
tests" items. Scruggs, et al. <^98S) concluded chat further research was 
necessary. 

The present investigation was conducted to provide furMier information 
on the attitudes of bdiaviorally disordered student'^ toward t*sts. A larger 
population, including more grades, was compared on a test attitude survey 
utilized in Study 2 of Scruggs, et al. ( 1983) In addition, since sex differences 
have been previously reported on ^rtifu^^e surveys (Scruggs & Mastropieri, 
1983), evaluation of sex differences or a possible inteiaaion o; group by sex 
was made. 

Method 

Subjects 

Subjects were % elementary school children attending a public school in 
a western metropolitan community. Students were enrolled in Grad'^js 1 
through 6. Of these students, 48 were classified as behaviorally disordered, 
while 48 were moie typical students enrolled in regular classrooms of that 
school To be iuJj 'ed in the study from the regular classroom, students were 
selected at random, using a stratified random sampling technique, from 122 
students representing the same grades. When possible, equal numbers of boys 
and girls per grade were selected to match numbers represented in the urget 
population. The breakdown by grade and sex for each group was as follows: 
3 students (1 boy, 2 girls) were enrolled in Grade 1, 8 students (3 boys, 3 
girls) in Grade 2, 4 boys in Grade 3, S students (6 boys, 2 girls) in Grade 4, 
11 students (behaviorally disordered = 9 boys, 2 girls; regular class = 6 boys, 
3 girls) in Grade 3, and 14 students (behaviorally disordered = 11 boys, 3 
girls; rt^gular = 9 boys, 3 girls) were enrolled in Grade 6. 

Students were identified as behaviorally disordered according to state and 
P.L. 94-142 guidelines, whicii included students exhibiting behavior or emo- 
tional conduct Over time which adversely affected educational performance and 
required special education services in self<ontained classrooms. 
Procedure 

Thi? 22-item Test Atricude Survey used by Scruggs, et aL ( 1985, Study 2) 
was given. This survey contained such items as "tests are an important part 
of school.*' **ijsts are more important to the teacher than to me,*' "tests are a 
waste of time,** **I try my best when I take a tesi," and 1 do poorly on tests.*' 
Items were intended to tap students* feelings of the importance of tests to 
themselves and to parents and teachers, as well as their own feehngs toward tests. 

The measure was administered immediately subsequent to yearly achieve- 
ment testing. Administration of the survey was conducted in the sti\!enrs' 
regular classroom, and ite-ns were answered together as ttie teacher read each 
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item alouA Scudeots were given 1 point for a positive response (i.e., "yes" 
to a positive statement, or "no" to negative statement) and 0 points for a 
negative response. 

RE5 jlts 

The reliability (Kuder-Richardson 20) of the present survey for this 
sample was .75, which indicated moderate reliability. Data were entered into 
a 2 (group) X 2 (sex) analysis of variance which was sig'iificant for groups 
(^1.02 - 17.79, p < .001). No significant main effect was found for i<?x 
(Fi.u'* 2.15, p = .15). Also, their interaction only approached significance 
(^1.02 = 3.10, p = .08). Descriptive data are given in Table 1. 

TABLE 1 

Means and Standard Deviations on Trst Attitude Survp.y 





Sex 


Average Students 
M SD 


Bchaviorally Disordcrci Students 
M SD 




Total 


18.02 


2.29 


15 33 


4 07 




Girls 


18.06 


2.08 


13.36 


5 08 




Boys 


18.00 


2.43 


15.92 


3.59 



Discussion 

The present investigation replicated the findings of Study 1 of Scruggs. 
etal. ( 1985) and suggests that behaviorally disordered children report less posi- 
tive attitudes toward test-taking situations than their mere normally functioning 
peers. This study also extended previous findings to include Grades 1 through 6 

InterprerjMon of the present findings must be made with caution, how- 
ever, since on at least one previous occasion, behaviorally disordered students 
did not reporr more negative attitudes than did their regular class peers These 
discrepant findings may be due to inconsistencies in samples or may reflect 
other variables, such as time of year or prior test experience, which could only 
be uncovered through further research. These results suggest that behaviorally 
disordered students differ from their normally functioning peers in rest-taking 
attitudes. Clarify? basic ularions between test scores and test attitudes of 
behaviorally disordered students requires study 
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Abstract 

Eighty-five mildly handicapped (learning disabled or behavioral ly 
disordered) students were assigned at randorr* to either a control 
condition or a condition in which students received five days' 
training on test-taking skills relevant to the Stanford 
Achievement Test, Results of test scores indicated that trained 
students scored significantly higher on tests of reading decoding 
and math concepts, A significant univariate interaction between 
experimental group and handicapping condition suggested that 
studintr classified as behaviorally disordered had differentially 
beiefited on the math concepts subtest. Finally, a descriptive 
but non-significant difference favoring trained students was found 
on the math computation subtest. Implications for special 
education are given. 
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The Effects of Coaching on the Standardized Test 
Performance of Learning Disabled and 
Behaviorally Disordered Students 
In recent years, researchers have attempted to identify 
sources of measurement error (Ebel, 1965) in handicapped 
populations. Such research is of importance because handicapped 
children are often among those most frequently tested in public 
schools, and because these populations have not been 
systematically represented in test standardization procedures 
(Fuchs, Fuchs, Benowitz, & Barringer, in press). Testing 
influences research has generally focused on the following issues: 
examiner effects (e.g, Fuchs, Fuchs, Dailey, & Power, 1985), test 
anxiety and attitudes (e.g., Bryan, Sonnefeld, & Grauowski, 1983; 
Tolfa, Scruggs, & Mastropieri , 1985), and test-taking skills, or 
"test-wiseness" (Millman, Bishop, & Ebel, 1965). 

In the area of test-taking skills, recent research ha^ 
supported the notion that learning disabled (LD) and behaviorally 
disordered (BD) students exhibit deficiencies in this area with 
respect to standardized achievement tests. LD students have been 
seen to exhibit deficiencies in the use of prior knowledge and 
deductive reasoning strategies (Scruggs & Lifson, 1984), selection 
of appropriate strategies and attention to appropriate format 
features (Scruggs, Bennion, & Lifson, 1985 a, b), and effective 
use of separate answer sheets (Tolfa-Veit & Scruggs, in press). 
Although standardized achievement tests have generally been found 

Wo 
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to be reliable and valid with mildly handicapped students (e,g., 
Pierce, 1984), results of the above test-taking skills research 
suggest that measurement error could be reduced (and consequently, 
scores improved) if mildly handicapped students could be 
successfully trained in test-taking skills. 

Some researchers have attempted to increase LD students' test 
performance by modifying the test formats. Beattie, Grise, and 
Algozzine (1983) modified the line lengths, item groupings, answer 
formats, size of print, and administration procedures on the State 
Student Assessment Test, Florida's minimum competency test, and 
administered the modified versions to third grade LH students. 
They provided some evidence that such ;:odif ications enhanced LD 
students' performance on competency tests, Statfstica'i 
confifination of these findings, however, was lacking. 

Other research has emphasized training or coaching students 
to be better test-takers. Much research has been conducted in the 
area of training in test-taking skills, but little of this 
research has addressed handicapped populations. In a recent meta- 
analysis, Scruggs, Bennion, and White (in press) examined the 
effects of such coaching on achievement test scores of elementary 
school children. They concl jded thit, in general, coaching had a 
very small overall effect on test ;cores, with sdnewhat larger 
effects being found for younger students, lower SES students, and 
students who had undergone longer training periods. No research 
was located in which mildly handicapped students had been trained. 
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although, more recently, such training has been accomplished. 
Scruggs and Tolfa (1985) reported that a small sample of trained 
LD students had scored higher than controls on standardized word 
analysis test items, while no differences were found for reading 
comprehension items. These same findings were replicated by 
Scruggs and Mastropieri (in press b) using a larger subject sample 
of LD and BD students. It was concluded that such training could 
have a strong facilitative effect (8-10 percentile points) on 
reading subtests with more complicated format demands, as 
suggested by Tolfa, Scruggs, and Bennion (1985). 

The findings of Scruggs and Mastropieri (in press b) and 
Scruggs and Tolfa (1985), although encouraging, left several 
issues unaddressed. First, the subjects in these investigations 
were mostly primary level students, generally less familiar with 
testing situations than older students. It would be of interest 
to know whether upper elementary students could benefit from such 
training. Second, training was only given in reading subtest 
areas, leaving open the question of whether suc.i training could 
facilitate performance on mathematics subtests. Finally, only the 
Scruggs and Mastropieri (in press b) investigation included BD 
students, and in that study, students were not stratified by 
handicapping condition and therefore analysis of any possible 
treatment by handicapping condition interaction was not possible. 
It was, therefore, the purpose of the present research to 
replicate and extend previous findings of training ^n test-taking 
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skills to include (a) upper elementary students, (b) mathematics 
as well as reading subtests, and (c) separate analysis of test 
performance by different handicapping condition. 

Method 

Subjects 

Subjects were 85 LD and BD students attending public schools 
in a western metropolitan area. Forty-four students had been 
classified as learning disabled, and 41 students had been 
classified behaviorally disordered by national, state, and local 
standards. These standards included, for LD students, a 40% 
discrepancy (expressed in standard scores) between ability, as 
assessed by individual intelligence tests, and two areas of 
academic achievement. Although LD students in the present sample 
exhibited discrepancies in several different content areas, the 
majority had been referred for deficiencies in reading, although 
they also exhibited deficiencies in mathematics functioning. These 
deficiencies were not considered by school personnel to be due to 
emotional disturbance. In contrast, behaviorally disordered 
students exhibited deficiencies in social or emotional 
functioning, as a primary indicator, which interfered with 
classroom learning. These referrals were made for several 
different reasons, but in most cases students had exhibited 
aggressive, non-compliant, or anxiety-governed behaviors which had 
interfered with routine classroom activities. All students were 
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receiving special services in which structured academic 
environments were provided. 

The sample included 21 4th, 38 5th, and 26 6th grade 
students, composed of 63 boys and 22 girls. Mean Weschler 
Intelligence Scale for Children--Revised (WISC-R) for the LD 
students was 97.73 (SD = 8.29). Mean WISC-R for the BD students 
was 92.80 (SD = 10.81). Achievement test scaled scores from the 
previous years' testing with the Stanford Achievement Test (SAT) 
were as follows: LD students, 572.96 (SD = 26.35) Total Reading, 
580.62 (SD = 23.54) Total Math; BD students, 570.64 (SD = 37.26) 
Total Reading, and 559.71 (SD = 31.05) Total Math. Complete SAT 
data, however was avail jible for only 75% of the subjects. More 
complete academic test information is given in the Re sults 
section. 
Materials 

Materials were developed specifically for the present 
investigation and consisted of (a) a practice test booklet with 
correct answers identified for practice with separate answer 
sheet, and (b) a practice test booklet with unmarked problems 
similar to, but not identical to, items in the SAT. Items were 
included which resembled those in two reading subtests (reading 
comprehension, word study skills) and three math subtests 
(concepts, computation, and word problems). A complete set of 
these materials is given in Scruggs^ (1985) • 
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Procedure 

Students were stratified by grade level and handicapping 
condition, and assigned at random to either a training or a no- 
treatment control condition. Students were not stratified by sex, 
but similar proportions of girls and boys were represented in 
handicapping and treatment conditions. Training condition 
students were seen in small (1-6) groups by one of three trained 
experimenters for five 20-30 minute sessions. In the first 
session^ students were given instruction and practice in the use 
of separate answer sheets using a practice test booklet for which 
correct items had been indicated with an arrow. Students were 
instructed in finding and monitoring their place on the answer 
sheet, marking and erasing carefully, and in checking their work. 
The second and third sessions consisted of training in reading 
subtests. For the reading comprehension subtest, students were 
taught to refer back to the passage for recall questions, to use 
deductive reasoning strategies for inference questions, and to 
look for similarities between phrases or words in the passage and 
answer choices. For the word study skills subtest, students were 
taught to attend to appropriate cues and sounds, rather than 
letter similarities in stem and option. The fourth and fifth 
sessions covered strategies appropriate to math s^ibtests. For the 
math concepts subtest, students were taught to attend carefully to 
specific format demands. For the computation subtest, students 
were taught to carefully recopy problems on scratch paper in the 
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most familiar form. Finally, on the word problems subtest, 
students were taught to attend to command words in the problem and 
work problems carefully on separate paper. On all subtests, 
students were taught to (a) work quickly and carefully, (b) check 
answers if time permits, (c) answer all questions, (d) eliminate 
answers known to be incorrect, (e) incorporate prior or partial 
knowledge, and (f) become familiar with all subtest format 
demands. 

The next week after training, all students were administered 
the Stanford Achievement Test by regu'^ar school personnel. 
Completed answer sheets were machine scored. 

Results 

Scaled scores were chosen for the present analysis because of 
their consistency across grade levels and their suitability for 
meeting the assumptions of analysis of variance (ANOVA). Means 
and standard deviations are in Table 1. It should be pointed out, 
however, that a separate analysis using percentile scores produced 
results virtually equivalent to those presented here. Since the 
five subtests were correlated, but not drawn from the same 
universe (Scruggs & Mastropieri, in press a), a multivariate 
analysis of variance (MANOVA) was conducted on the five subtest 
scaled scores, using handicapping condition (LD vs. BD) and 
condition (treatment vs control) as independent variables. 
Although the multivariate effect for handicapping condition did 
not approach significance, F(5,77) approximation » 1.69, p « .147, 

Idx 
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the condition effect revealed an F(5,77) approximation of 2.30, p 
= .05. Follow-up univariate F tests for condition revealed a 
significant effect for the Word Study Skills, F(l,81) = 4.56, p = 
.04, and the Math Concepts, F(l,81) » 4.92, p = .03, subtests. 
With the exception of Math Computation, F(l,81) = 1.77, p = .19, 
all other Fs (Reading Comprehension, Math Applications) were less 
than unity. Obtained effect sizes on scaled scores are presented 
in Table 2. MANOVA revealed no significant multivariate 
interaction effect, F(5>77) « 1.69, p « .15. Further univariate 
tests revealed one significant group by condition interaction 
F(l,81) « 6.03, p » .02 on the mathematics concepts subtest (see 
Figure 1). 



Insert Tables 1 and 2 and Figure 1 about here 



Discussion 

The findings of the present investigation replicate the 
findings of Scruggs and Tolfa (1985) and l^cruggs and Mastropieri 
(in press b) and extend them into upper elementary grades, 
mathematics subtests, and allow comparison of LD vs. BD student 
performance. That trained students outperformed controls on word 
study skills and mathematics concepts subtests supports thi 
hypothesis of Tolfa, Scruggs, and Bennion (1985) that tests with 
more complicated formats may prove differentially difficult for 
mildly handicapped students. That is, the word study skills and 
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mathematics concepts subtests each contained several potentially 
confusing format changes (e.g., different formats for testing 
"syllabication" and "decoding" skills within the same word study 
skills subtest). Reading comprehension and math applications 
(I.e., word problems) subtests, however, contained more "obvious" 
format demands, and fewer format changes. Resulting effect sizes 
of these scores were generally higher than those of nonhandlcapped 
children (M^s = -10) (Scruggs, Bennlon, & White, In priss). 

The obtained univariate interaction by handicapping condition 
on the mathematics concepts subtest may simply represent a Type I 
error (due to the lack of significant multivariate interaction 
effect), but certainly deserves further research attention- 
Mathematics functioning has sometimes been noted as an area of 
particular difficulty for BD students (Mastropieri , Jenkins, & 
Scruggs, 1985), perhaps reflecting problems with attention and 
persistence of effort. It is, therefore, possible that the LD 
children benefitted most from techniques which Involved planning 
an approach to a problem such as sequencing and organizing; 
whereas, the BD children additionally benefitted from techniques 
which enhanced concentration and sustained attention. 2 Additional 
research can help clarify these Issues. 

The results of this and previous investigations suggest that 
LD and BD students possess more knowledge than they are able to 
demonstrate on standardized tests; therefore, scores, such as 
students who have been taught test- taking skills, are more valid 
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indicators of their true abilities. It could be argued, on the 
other hand, that problems with "test-taking skills" (e.g., failing 
to check work) are representative of classroom problems and 
therefore untrained student scores would be more valid estimates 
of classroom performance. Such arguments await further empirical 
verification. 

Another argument, which we have made previously (Mastropieri 
& Scruggs, 1984), reflects what we have considered the dual role 
of the special education teachers. To a greater extent than the 
regular classroom teacher, the special education teacher has a 
responsibility not only (a) to teach specific skills and content, 
but also (b) to teach the student how to apply these skills in 
appropriate contexts outside the special education setting. Not 
to do so constitutes an incomplete fulfillment of this 
responsibility. To this extent, training in test-taking skills 
seems not only justifiable, but a necessary component of general 
teaching strategies to promote generalization and transfer of 
learned information. 

The results of this and previous research indicate that 
test-taking skills can be trained to mildly handicapped elementary 
age students, and that this training can significantly impact on 
test performance. Future research efforts are needed to assess 
whether similar training can also benefit secondary level mildly 
handicupped students, and whether training can improve scores on 
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teacher-made tests. The present authors are currently 
investigating such possibilities (Taylor & Scruggs, 1983). 
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Table 1 

Standard Score Means and Standard Deviations 
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Learning Disabled 



Behavioral ly Disordered 



Subtest 



Treatment 



Control 



Treatment 



Reading 

Comprehens i on 588 . 43 ( 33 . 1 ) 



Word Study 
Skills 

Mathematics 
Concepts 

Mathematics 
Computation 



600.38(22.6) 
593.57(27.2) 
601.76(39.5) 



Mathematics 

Applications 595.62(28.7) 



Control 



599.96(25.8) 604.33(50.57) 591.91(24.7) 



587.30(25.6) 608.22(39.6) 594.70(26.0) 



595.04(21.7) 620.17(42.7) 591.13(20.9) 



599.65(29.6) 609.06(40.9) 591.22(27.7) 
593.87(29.3) 591.50(33.0) 584.09(27.7) 
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Table 2 

Obtained Effect Sizes 





Subtest 


Effect Size* 


• 


Reading Compreheir-ion 


.10 




Word Study Skills 


.53 




Math Concepts 


.59 


• 


Math Computation 


.47 




Math Applications 


.15 



# *A11 effect sizes were computed on scaled scores using the control 

group standard deviations as divisor and E-C mean differences in 
the numerator. 
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Figure Caption 

Fiaure_J: Mathematics concepts subtest: Group by condition 
interaction . 
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F4rc0piud and Motor Skills, 1983, 60, 847-830. ® Percepnul tod Motor Skills 1983 

IMPROVING THE TEST-TAKING SKILLS OF 
LEARNING DISADLED STUDENTS* 

THOMAS E SCRUGGS AND DEBRA TOLFA 
Vtsh Ststg UfUPtrsify 

Summary. — 16 Icarnio^Hlisiblcd second- and third-srade students were 
matched on previous years' achievement scores and grade and assigned at 
random to experimental and control conditions. Students in the experimental 
condition wea- ^iven 8 20-min. sessions of training in test-taking skills par- 
ticul'^t to the Stanford Achievement Test. Analysis of test scores indicated 
trained students scored significantly higher on one subtest of a sliortencd 
version of the test than students who had not been trained. 

Since ihc seminal article by Millman, Bishop, and Li ^^I in 1963 on test- 
wiseness, or test-taking skills, inter-jsr has grown in the construrr of tcst-wisc- 
ness as a possible source of measurement error (3). Although some specific 
groups and populations have been said to be low in "test-wiscness** ^9), the 
issue of whether or not students classified as learning disabled exhibit the 
same test-taking skills as nondisabled peers has only recently been investigated 
(10). Scruggs nnd Lifson (7) administered reading comprehension test items 
with accompanying passages deleted to groups of learning-disabled and non- 
disabled students. Their results indicated that, although nondisabled students 
were able to take advantage of prior or partial knowledge and deductive reason- 
ing strategics to answer most of the questions correctly, learning-disabled stu- 
dents were less able to utilize these strategies. In another investigation (6) 
learning-disabled and nondisabled students were interviewed regarding thcit 
strategies on reading-achievement-test items. Results suggested that learning- 
disabled students were less likely than their nondisabled pects to apply 'ap- 
propriate test- taking sm regies*' to tcading^ompfchcnsion-test items and learn- 
ing-disabled students were more likely than nondisabled peers to be misled by 
particular format demands on tests of "word-study skills" (i.e., phonetic anal- 
ysis). 

Althoui;h the above research indicates that learning-disabled students may 
be lacking with respect to specific test-taking skills, this research docs not in- 
dicate that these students can easily be taught these skills to the extent that 
achievcmcnt-tesf performance would improve. In facr, little is known alx)ut 
teaching test-taking skills to learning-disabled students. Recently, Dunn (2) 
successfully taughr test-taking skills to a sample of junior high school-age 



'This research was supported in part by a grant from the Department of liducation. Office 
of Special Education, No. G00S30000S. The authors thank Marilyn TtnnaLul nml M.nry 
nicn Kcincr fnr tlieir assistance in the preparation of the- manuscript. Address roquests 
for reprints to Tliumas E. Scruggs, Ph.D. UMC 68, DcvL-lopincntal Ccntc-r for llaiidi- 
cappcni IVrsons, Utah Stare University, Logan, Utah 84322. 
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IctrningKlistblcd students, but to date, tcst-ttking skilU have not been tav^t 
elcmcntary aged learning-disabled students. Hit purpose of the presrat re- 
search was to determine whether specific test taking skills could be taught to 
elementary-aged learning-disabled students to improve their performance on 
standardized achievement-test items. 

Method 

Subjects were 16 second- and tliird-gride leirningKllsabled students ittending 
spc-cial educition classes in t western metropolitan area.' Oiteria for placement as 
karnins disabled included average intelligence coupled wUh 40% discrepancy between 
ability and at least txvo areas nf academic funaioning. Althnugb 1Q$ were not available 
for this study, all students were said to have been funaioning within a normal range ot 
intelligence. Students were individually matched on the l«sis of grade and previous 
year s reading test scores ano assigned at random to either experimental or control groups. 
Average reading percentile w:s 29.0 {SD = 18.5) for the experimental group and 28.3 
{SD = 19.7) for the contro' group. Average age tor each group was 7 yr., 8 nin {SDi 
8 mo. and 6.5 • rani^^. 7 yr. to 8 yr., 4 mo. and 7 yr., 1 mo. to 8 yr., 6 mo, re- 
spectively, for experimental and control groups). Five (62.5%) second graders and three 
(37.5%) third graders were in each group; the experimental group contained four girls 
and four boys, while the control group contained three girls and five beys. 

Materials were eight scripted lessons for each grade in a dircct-instruction format 
and accompanying workbooks for students which included pencil-and-paper praaice 
a-tivities.' All items were similar to, but not exaa items from, the Stanfnrd Achievement 
Test. Tlie general test-taking strategies taught in these materials included attending, 
marking answers carefully, choosing the best answer carefully, error-avoidance strategies, 
and appropriate situations for soliciting the teachers attention. Specific test-taking 
strategies were ttught for each reading subtest in the Stanford Achievement Test. Tliese 
included structured praaice on specific test formats for each subtest, and specific applica- 
tion of general test-taking strategies to each specific subtest. For example, with rcspea 
to the "letter-sound" component of the Word Study Skills subtest, students were taught 
to employ the following sequence of strategics: Look at and read the first word. Pro- 
nountc to yourself and think of the sound of the underlined letter. OrefuUy look ar 
the underlined choices and choose the word with the same sound as the underlined letter 
If you don't know all the wnrds, read the words you do know nr read pnrts of indiv.dual 
words you may know. If you re not sure of the answer, see if there are $*)mc answers 
that you are sure are not corrca and eliminatt those. Color in the answer quitk. dark, 
and inside the line. Guess if you are still net sure; never skip an answer 

Experimental subjeas were taught in small groups for four 20.min lessons per week 
for 2 wk. Pt)sitive respontling and attention to task were reinforced with stickers 

The first seven sessions taught the use of test-taking strategies within the specific 
Ci)ntext of each of the reading-related subtests The last scssii)n consisted of a /jenctal 
review nf ali previous procedures. Each day of instruction involved extensive work with 
practice aaivities applied to praaice test items. Swdenrs were given no information 
concerning the content of the actual test not specified in the published test dirtaions 

li small group of fourth-grade learning-disabled students was tiriginally intendt-d fot 
inclusion in the study but had to Ix dropped Ix-cause attrition and methodological 
problems were associated with the test JtlminKtration ft)r this ^;r()up 
T. n Scrug^;s & J. Williams. SUPER SCORE test-taking ni.inuals and wotklxx>ks 
(Unpublished training materials, Utah State Univetsiry, 1981) 
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Following the Lut tnuoins prooedufc and posocst. all trained and control student* were 
administered shoncncd versions of the cetdtng subccscs of the Stanford Achievement Test, 
Items were taken from the Prinury 2 level, Form E and Primary 3 level, Form E 
Tlw shortened venioa for Primary 2 level included the first 13 items on the Comprchen- 
slon sL'btest and the first 16 items on the Word Study Skills subtest. The shortened 
vcrsioi. or the Primary 3 level todudcd Items 9 to 22 on the Comprehension subtest and 
Item I to 9 and 19 to 32 on the Word Study Skills subtest. The Primary 2 test had a 
total o' 13 Comprehension questions and 16 Word Study questions; while the Primary 3 
test h-o f total of 14 Comprehension questions and 23 Word Study questions. Tlic 
num! tcms was chosen for each condition to represent the number of items expected 
10 U' c.,M|.;cied in 20 min,, according to direaions. Although the subtests were shortened 
• , t\>.ii.nodate the student's scheduling constraints, standardization procedures were 
a ' >':d in the administration of the test, which was done in the resource setting by an 
aiiminist itor unfamiliar to the students anu unaware of group mcmkrship of the 
students. Percent correct scores were analyied instc-ad of mean number corrca because 
ciicrc was a different total number of items for each suUest and level 

Results and Discussion 
Pcrc .':t corrca scores for experimental and control students were com- 
pared statistically by means of / tests for independent means,* for Word Study 
Skills, Reading Comprehension, and combined subtests. Descriptively exper- 
imental students scored an average of 77.1% {SD = 13.6), 48.9% (SD = 
32.3 ) , and 63.0% (SD = 20.6) , for Word Study Skills, Reading Comprehension, 
and combined subtests, respectively. Control students, by contrast, scored 
56 8% {SD = 20.1), 50.3% {SD = 24.3). and 55.4% {SD = 15.1) on the 
same subtests. Tlie only significant difference between groups was on the 
Word Study Skills subtest (^u = 2.38, p = .03). Differences were not found 
on cither the Reading Comprehension (/m = —.10) or the total subtest (/h 
=: 1.05) scores. 

It was seen that learning-disabled students trained in test-takini* skills 
significantly outperformed their untrained peers on the Word Study Skills sub- 
test but not the Reading Comprehension subtest, of a modified version of the 
Stanford Achievement Test. Although it is not certain why performance was 
iippioved on one subtest but not another, it is possible that performance on 
the Word Study Skills subtest was more easily trained bccnusc this subtest con- 
tained several different formats, introduced over a short period of time, which 
may have been confusing to the control students. The rcsultini; effect si-cc of 
this subtest ( l.Ol SD units) as well as the total score effect size (.63 SD units) 
arc substantially larger than those reported in the literature (1, S) and may 
indicate the deficit in test-taking skills may be soiiiewhat stronger for this 
sample than others as supported by recent rcicnrch 

'SHiaTsuUiccts \wre matchcd^lt is possible to compute / tests for corrcUtcNl dnta, tliK was 
not dotw here since scorc-s of matclted suhiects ucre nnt correlated on the t^***"^:;' 
Correspondmg / rr.tios for correlated data {di ^ 1) were essentially equivalent at 2 J) 
(p = 06)/ -0.10, and 1.12 for Word Study SkilU. !U'.uiinfi Comprehension, and 
total subtests respectively 
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At Icwt some aspects of rhc trtining appear to have been effective in 
raising test performance; however, che use of a no treatment control group 
prohibits drawing conclusions regarding whar specific aspeas of the rraining 
were most effeaive. Further research could help clarify these variables. 

Although it is true char the use of standardized achievement tests in special 
education is a controversial issue (4), it is also true that ir is the obligation 
of special education personnel to maximize the functioning of learning^isabled 
students whenever possible, including performance on standardized achicvc- 
menr tests. Ir is also true that the skills taught for use on the Stanford Achieve- 
ment Test may be even more valuable for teacher-made tests which may con- 
tarn even more cues for the effeaive use of test-taking skills Although the 
findings of the present investigation are promising, the small sample and the 
reduced version of the Stanford Achievement Test used as a dependent measure 
indicate that replication of these findings is necessary. 
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Tlu» iMipiilar cdiKvptNiii «if lest •wisom'ss is n»viowi»<l ami (*valii«il(*d Aldidii^h soim* 
Kiipixirt f<ir the coiuvpl of tost •wi.soik^ss cmsIs. in )*(MU*ral ilu* innuence <if 
wis<Mi(*sslia.s()OiMi){nMilyov(*R*s(iinat(*d particularly with rrsiHVl l<> (a)<'<intriliii- 
tiontoiiirasuroiiuMil error. (Ii) cult lira I (lirrm*n(vs.(r) iim1o|h»iuUmuv from general 
uitellijiciicc, anil (<1) facility for Iraiiiin^. This art irk* places ciHunuin statciiiciiLs 
rc);ardiiiK t(*st'Wi.s(*ncss in |)oi's|MH:tiv(* of actual research (iiuhiifts. 



It has l)cen known for many years that 
all test scores reflect two additive ele- 
ments: "true" score, accounting for the 
construct being measured, and "error" 
score (Magnusson, 1967), It has also l)ecn 
suggested that the error score may be itself 
composed of several additive componenLs 
(Ebel & Damrin, 1960; Thomdike, 1951). 
These components have been said to 
include text anxiety (eg., Sarason, 1978), 
achievement motivation (e,g., Atkinson, 
1974; Chapman & Hill, 1971), and self- 
esteem (e.fr, Roen, 1960), Such possible 
elements of measurement error l;ave been 
discussed in detail by Jensen (1980). 

Since 1965, an additional construct 
has been discussed repeatedly in the litera- 
turewhich is commonly thoughtto involve 
a subsUntial sou rce of measu remen t error. 
Thus construct was defined by Millman, 
Bishop, and El)el (1965), as "tcst-wiscness" 
(TW). Millman et al. deflned TW as "a sub- 
ject's capacity to utilize the characteristics 
and formats of the test and/or the test- 
taking situation to receive a high score" (p. 
707). They further described TW as "logi- 
cally independent of the examinee's know- 
ledge of the subject matter for which the 



items are (sic) supposedly measures" (p. 
707). Ebel (19G5) has suggested that error 
in measurement is more likely to be 
obtained from students low in test- 
wiseness. The studen t low in TW, therefore, 
may l>e more of a measurement problem 
than the student high in TW (Slakter, 
KoehIer,& Hampton, 1970). 



Analysis and M easurement of 
Test-Wiseness 

Millman, Hishop, and Ebel (1965) have 
provided a definition and analysis of the 
construct on which most subsequent re- 
search has been based (Sarnacki, 1979). 
Millman et al, defined TW as distinct from 
general mental attitudes such as confi- 
dence and anxiety, and motivational states 
of the test-taker. In their analysis of TW, six 
elements were delineated. Four of ^hese 
elements were considered to be independ- 
ent of the test constructor or test purpose, 
while two were considered to be depend- 
ent on test constructor or test purpose. 
The four independent elements included 
(a) time using strategies, (b) error avoid- 
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ance strategies, (c) guessing strategies, 
and (d) deductive reasoning strategies. 
Time using strategies included working 
quickly and efficiently and saving more dir- 
ficult or time-consuming items Tor last 
Error avoidance strategies included at- 
tending to directions, marking answers 
carefully, and checking all answers. Guess- 
ing strategies were considered to be the use 
orguessing when it was likely to benefit the 
test-taker. Deductive reasoning strategics 
included elimination ofitcms known to be 
incorrect, item choices l)ascd on an analy- 
sis of the relation among items, such as 
choosing neither oftwo items which imply 
the correctness or each other (similar 
opt'ons), and use of content information 
from otiier test items and options. 

The two elements thought to be de- 
pendent upon test constructor or purpose 
were intent consideration strategies and 
cue-using strategies. Intent consideration 
strategies included adopting the appro- 
priate level of sophistication for the test, 
and considering the purpose of the test 
constructor. Cue-using strategics referred 
to the use of any consistent idiosyncrasies 
of the particular tost constructor, such as 
Inclusion of more true or false statements, 
placement of correct distractor, and gram- 
matical inconsistencies between stem and 
options. Avoidance of items using the 
words "always" and "never" (specific de- 
terminers) was also 'considered a cue- 
using strategy. 

Researchers have typically assessed 
TW in one oftwo indirect ways. One method 
is to teach TW skills to a population and 
assess the extent to which scores improve. 
The other method is to construct questions 
which are answerable only by use of spe- 
cific TW skills and embed these items in a 
larger test of answerable items. An exam- 
ple of an item answerable in terms of a TW 
strategy (similar options) was given by 
Slakter, Koehler, and Hampton (1970, p. 
249): 

"^en Bestor crystals are added to 
water: 

1. Heat is given off; 

2. The temperature of the solution 
rises; 

3. Thesolutton turns blue; 



4. The container becomes warmer." 

The keyed answer to this item is (2), since 
the other options imply the correctness of 
each other. In a similar fashion, guessing 
strategies have been assessed by indicating 
a penalty for incorrect responses, and 
embedding nonsense items for which no 
answer is correct. The ex(ent to wh ich sub- 
jects answer such nonsense items Wiis con- 
sidered a mejLsnre of guessing strategies 
(Slakter et al., 1970). Finally, such general 
TW strategies as use of prior or partial 
knowledge, <lo<hic(ivo reasoning, and use 
of prior items have been assessed by 
administering rejiding comprehension test 
questions for which the referent reading 
passages have been deleted (e.g., Dunn, 
I98I; Lifson, Scruggs, & Itennion, 1984; 
Scruggs &Lifson, 1985). 

Since the initial analysis by Millman et 
al. (1905), a voluminous literature has 
emerged, reviews of which have licen writ- 
ten by Rangert-Orowiis, Kulik and Kulik 
(1983), Ford (1973), Fueyo (1977), Jones 
and Ligon (1981), and Sarnacki (1979). 
These reviews are all thorough to the 
extent that they cover adeciuately the body 
of literature referring to TW as it has been 
evaluated over the pstst two decades. It is 
the view of the present authors, however, 
that much of the influence associated with 
TW has been oversUUed (o the point of 
distortion. It is (he purpose of the present 
pa|)cr to clarify some issues regarding the 
construct "test -wiseness'' and its conse- 
quences. 

CJommonly made s(a(ements regard- 
ing TW which are considered to be **myths'' 
(by the present authors) include the fol- 
lowing: (a) there is no substantial correla- 
tion between test-wiseness and intelli- 
gence; (b) TW constitutes a large source of 
variance which Is commonly found in tests, 
(c) different American cultural groups are 
seen to differ substantially with respect to 
test-wiseness, and (d) test-wiseness is eas- 
ily trained and results in substantial in- 
creases in test scores. These "myths" will be 
considered separately, followed by a review 
of literature relevant to each, and a discus- 
sion of the realities associated with each 
particular myth. 
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Mrm#l: Test-WisenessisNot 
SubctantiallyRelated to 

GENERALlNlELLIGENCE 

This myth is based largely upon the 
assumption that TW constitutes esential(y 
an unfair advantage on test-taking tasks 
which some students have happened to 
acquire arbitrarUy, while others have not 
In addition, TW loses much credibility as a 
construct if it can be shown to be higlily 
related to intelligence, and tlierefore not a 
specific, independent factor. Finally, if TW 
is not strongly related to intelligence, then 
it appears more likely that it can be easily 
trained; consequently, groups who can l>e 
shown to suffer with respect to TW would 
hypothctically benefit greatly from short 
instructional lessons in TW. 

Millman et aL (1965) suggested that a 
test-wise subject would perform better on 
tests than would a less test- wise subject of 
equal intellectual ability. Wahlslrom and 
Boersma (1968) maintained, Vhile *good* 
items may be used to control for error var- 
iance associated with test-wiseness, tlie 
writers contend that teacher-made ach- 
ievement tests contain items with faults, 
and that test-wise subjects often received 
higher scores than subjects of equal intel- 
lectual ability* (p. 419). 

The basis for this particular myth is 
found in a small number of empirical stu- 
dies, whose interpretations have been 
greatly distorted. These investigat ions will 
be discussed in turn. 

Dunn and Goldstein ( 1959) correlated 
scores on a group administered intelli- 
gence test (Army Aptitude Area 1) with 
scores on blocks of multiple choice items 
containing specific item flaws. These au- 
thors argued that since moderate correla- 
tions (.62-.72) were found between IQ and 
item blocks containing different TW cues 
as well as items containing no TW cues, ''the 
ability to pick up cues on the type of mate- 
rial tested may be found at all Levels of 
intelligence" (p. 178). In this investigation, 
however, no direct assessment of the rela- 
tton between IQ and TW was made. 

Krelt (1968) hypothesized that the 
intelligence of sul^ects is related to the 
acquisition of test-taking skiUs, and that 
more intelligent children would improve 



more from test session to test session. This 
hypothesis was not supported. Kreit re- 
ported only nonsignificant trends in the 
hypothesized direction. In this investiga- 
tion, however, narrow and overlapping 
groups comprising his sample precluded a 
fair assessment of his hypothesis. This 
author, then, did not demonstra4,e the lack 
of a strong relation, but merely Tailed to 
support his own predictions with rcs|)cct 
to one aspect of the IWintclligcncc issue. 

The most commonly cited study with 
respect to tcst-wiscncss and intelligence 
was conducted by Diamond and Cvans 
(1972). These researchers concluded (ha( 
TW is cue-specific (that is, not one general 
ability) and that the overall correlation 
iMJtween the aspects of TW test ed was not 
strong. In fact, the overall correlation 
between IQ and TW reported by Diamond 
and Evans was .49 which, if corrected for 
attenuation of the somewhat unreliable 
test-wiseness test, lujcomcs a correlation of 
.61. In either case, the obtained correlation 
is strong enough to constitute a moderate 
relatk>n between test-wiseness as measured 
and general ability. The conclusions of 
Diamond and Evans, although unwar- 
ranted, have been consistently cited by 
others more interested in perpetuating the 
myth of this aspect of TW than accurately 
reporting the data 

Other researchers, not as widely cited, 
have provided stronger information thai 
TW and intelligence are in fact related. 
Anderson (1973) reports, "analysis of the 
correlational data indicates that for the 
total sample a significant (though moder- 
ate) correlation is obtained between TW 
and mental ability, between lAV and ach- 
ievement, and between TW and deductive 
reasoningabilit/*(page89). Millikin (1975) 
correlated performance on a test-wiseness 
test and a general mental ability test on a 
sample ofSOG eleventh grade subjects, and 
found a significant relation between a 
measure of general ability and TW. 

Taken as a whole, the bulk of the 
research literature seems to indicate that a 
substantive correlation is typically found 
between TW and tests of mental ability, 
allowing for a tangible amount of shared 
vaiiance. Apparently, however, these find- 
ings have not satisfied other authors in the 
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field of TW, for the previously reviewed 
articles are generally selectively cited as 
providing evidence that TW and intelli- 
gence are not correlated significantly. Th us, 
Dillard, Warrior-Beivjamin, and Perrin 

(1977) maintained, -Kreit (1967) found 
that improved test-wiseness and intelli- 
gence were not significantly related" (p. 
1 1 35). Likewise, Crehan, Gross, and Koehler 

(1978) cited Diamond and Evans and re* 
ported, ''previous research has shown that 
TW is not highly related to cognitive ability" 
(p. 40). Crehan, Koehler, and Slakter 
(1974) also cited Diamond and Evans and 
reported, '^investigators examining the 
cognitive correlates of TW have concluded 
that TW is not highly related to cognitive 
ability- (p. 209). This myth has also been 
maintained by those who simply assert 
that students equal in intelligence may 
differ in TW. For instance. Gross (1977) 
asserted "(TW) concerns the extent to 
which examinees of similar ability of 
achievement received different test scores 
as a result of differences in test-taking 
shrewdness- (p. 97). Wahistrom and 
Boersma (1968) asserted . . test-wise Ss 
often receive higher scores than Ss of equal 
intellectual abUity" (p. 419). 

It can, therefore, be seen that in spite 
of substantial evidence linking general 
reasoning ability and measures of test- 
wiseness, researchers have continued to 
report the lack of a relation between the 
two variables. The reasoning for this is 
uncertain, although it no doubt reflects in 
part an interest in (a) defending the con- 
struct of TW as one separate from intelli- 
gence, and (b) consequently, implying that 
such ability is easily trained and manipu- 
lated. To this end, relevant data have been 
misinterpreted, or simply ignored. In addi- 
tion to the empirical findings of correla- 
tions between TW and intelligence, and the 
methodological errors of those who main- 
tain there is no such relation, an appeal to 
''common sense" can be made. High on the 
list of Millman, Bishop, and Ebefs analysis 
of test-wiseness is what is referred to as 
"deductive reasoning strategies'*, of which 
are included elimination of options known 
to be incorrect* elimination of options 



which imply the correctness (or incorrect- 
ness) of each other, utilization of relevant 
content information in other test items, 
and choice of items which encompass all of 
two or more given statements known to be 
correct Other strategies include a deduc- 
tion of the intent of the test constructor 
and a determination of regularities in stem 
or option cues on the part of the test con* 
structor. It would defy credibility to assert 
that these "deductive reasoning" strategics 
are not related to general mental ability. 

As with most myths, however, ele- 
ments of truth remain. If it is obvious t hat 
many test-taking strategies are strongly 
dependent upon the reasoning skills of the 
test-taker, it is also obvious that some 
other strategies can be easily taught and 
involve little reasoning ability. These in- 
clude such strategies as working quickly, 
moving past items which resist a quick 
response, answering all questions, using 
time remaining after the completion of 
tests to reconsider answers, asking the 
examiner for clarification of ambiguous 
questions, guessing whenever necessary, 
and developing prior familiarity with spe- 
cific test format demands. These strategies 
also com prise a componen t of test- wiseness 
and have been successfully trained to 
mildly handicapped students at the 
primary-age level, to the extent that per- 
formance on achievement tests has been 
enhanced (Scruggs, in press; Scruggs & 
Tolfa, 1985; Scruggs & Mastropieri, in 
press). Although such strategies as those 
previously mentioned do not typically 
appear on tests of test-wLseness,** these 
strategies may be, in fact, somewhat inde- 
pendent of intelligence and therefore sub- 
ject to relatively simple remediation. To 
this extent, then, the issue of test-wisenoss 
not being related to intelligence does have 
some support To the extent to which this 
myth has been reported in the literature, 
however, it must be challenged — that is, 
TW is not a construct which students 
happen to acquire by chance or serendip- 
ity, which is unrelated to intelligence, and 
which results in substantial fluctuations of 
scores in achievement tests. 



Myths and Realities 



343 



MYTO #2: TECT-WISENESS CONSTITUTES A 

Large Source op Variance/ 
Test-Wiseness Cues are 
Commonly Found on Tests 

Although it is clear that some students 
are less able to **outguess** certain test 
items than their test-wise" peers, the issue 
at stake in this particular myth revolves 
around whether or not ^he amount or var- 
iance associated with TW is large. Some 
authors have simply reported that TW is a 
|)Otcntial source of error. Gross (1977) 
argues, **Millman, Bishop, and Ebcl (1965) 
have advocated that TW be taught to min- 
imize intcr-cxamincc TW differences, 
thereby reducing measurement error . . .** 
(p. 97). Gross (1977). referring to Ebcl 
(1905), writcs/more error in measurement 
is likely to originate from students who 
liave too little, rather than too much, skill 
in Uking tests" (p. 97). Sarnacki (1979) 
writes, *^rw is widely recognized as asource 
of additional variance in test scores'and is 
a possible depressor of test validit/* (p. 
253). Some authors, however, have magni- 
fled the importance of this argument and 
have written that, in fact, the source of 
error in tcst-wiseness is extensive. Thus, 
Wahlstrom and Boersma (19G8) main- 
tained, **an important source of variation 
in test scores is test-wiseness" (p. 413). 
McPhail (1978) argued, "test-wiseness 
operates as error variance and its effect is 
to reduce the validity and reliability of 
tests" (p. 168). Kalech.stein, Kalechstein, 
and Doctor (1981) maintained, **test- 
wiseness has been considered a potentially 
large source of error variance" (p. 198). 

The fact that TW accounts for a source 
of error variance Ls indisputable. The ques- 
tion here is whether, in fact, TW constitutes 
a /arige source of variance and whether TW 
cues are commonly found in tests. The 
basis for the magnitude of the eflect of TW 
derives largely from a confusion between 
the terms **statisticaliy signiflcant" and 
"practically important" For example, 
Sarnacki (1979) cites a number of studies 
for whteh statistically significant increases 
in testscores were associated with training 
in TW (e.g„ Callenbach, 1973; Gross, 1976; 
Oakland, 1972). Although Sarnacki is cor- 
rect that these researchers did, in fact, 



exert a ''significant** increase in test scores 
as a result of training in TW, the fact is that 
in virtually all cases, the effect sizes were 
quite small (this issue will be discussed 
further underthe*easilytrained" myth). In 
fact, the very studies that Sarnacki cites 
are stronger arguments in favor of the 
issue that TW is a relatively small source of 
variance in achievement test scores. One 
specific study is worthy of mention. Sar- 
nacki cites Gross (1976) as evidence that 
significant increases in test scores were 
associated with training in TW. A review of 
this dissertation, however, demonstrates 
that three selected TW behaviors were 
taught These behaviors included risk tak- 
ing, deducti^/e reasoning, and time using. 
The dependent measure was the Metropol- 
itan Achievement Test (MAT) Advanced 
Battery. Gross concluded that (a) deduc- 
tive reasoning was not successfully taught 
(see TW not correlated with IQ" myth), (b) 
risk taking (i c., guessing) exerted a signifi- 
cant influence on test score only when 
guessing was inhibited in control condi- 
tions, and (c) although time using wassuc- 
cessfully taught it did not afTect test score. 
Thus, the very dissertation cited by Sar- 
nacki suggests that TW constitutes a rela- 
tively small source of variance. 

In one of the most thoughtful investi- 
gations of TW, Rowley ( 1 974) admin istered 
vocabulary and mathematics test items in 
both free response and multiple choice 
formats. Partial correlations were com- 
puted between scores on multiple choice 
items and measures of TW and risk-taking 
(RT), with free response scores partialed 
out Rowley found .si^»nificant partial corre- 
lations lictween vocabulary scores andTW 
and RT measures, and concluded that use 
of multiple choice tests **can result in high 
risk-taking, test-wise examinees scoring 
more highly than other examinees whose 
knowledge and ability are the equal of 
theirs" ( p. 2 1 ). Analysis of the actual extent 
of performance advantage of students high 
in TW is difficult, because gain scores (from 
free response to multiple choice) were not 
reported. Examination of correlational 
data, however, indicates that TW and RT 
were not correlated at all with mathemat- 
ics multiple choice items ( partial r's = near 
0) and that the partial correlations with 
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vocabulary items were not high (fs of .27 
and .14 for TW and RT, respecthrely) when 
guessing was not penalized (see Gross, 
1076). In this investigation, then, TW was 
seen to account for 7% of the variance in 
vocabulary test performance, while RT ac- 
counted for less than 2% of total vocab- 
ulary test variance. When this Trnding b 
considered with the near zero correlations 
between TW, RT, and mathematics test 
performance, the conchision that such fac- 
tors constitute a large source of variance is 
difTicull. to justify. 

Another ar^^ument in favor of the 
"large source of variance* myth comes from 
analyses of tcsLs themselves. MctfesscI and 
Sax (1058) looked for bias in placement of 
key to correct answers and found that 
more questions were keyed "true* on true- 
false tests than "false." They argued that 
42% of the tests that they studied were 
found to have answer placement flaws that 
may conspire wil h response sets to artin- 
cially inflate scores. Even if these data are 
true, the point remains that test-takers 
would need to know ahead of time in which 
direction keyed items were biased in order 
to make any benefit of these flaws. The 
strongest argument with respect to Met- 
fesscl and Sax* analysis, however, is that 
although they document the possibility of 
placement (laws which may artificially 
inflate scores, they offer no quantitative 
data which support that these cues actu- 
ally do result in inflated scores. 

In order to investigate more fully 
whctherTW cues are commonly present in 
achievement tests, the present authors 
have recently examined five msgorstand- 
ardi/.ed achievement tests (California 
Achievement Test, Mctrojwlitan Achieve- 
ment Test, Comprehensive Test of Basic 
Skills, Iowa Test of »asic Skills, and Stan- 
ford Achievement Tests) for presence of 
TW cues, including specific determiners, 
similar options, stem options, or absurd 
options as defined by Slakter et al. ( 1970). 
We independently evaluated all test items 
for the presence of these cues and after- 
wards computed a 96% coefTicient of 
agreement on TW cues. Nevertheless, we 
found that such TW cues exist in less than 
half of 1% of items on all these tests, sub- 
stantially difierent from the ''large source 



of variance** TW cues are supposed to en- 
compass. 

Another argument which can be made 
is that although such cues are not com- 
monly present in standardized tests* they 
are present to a large extent in teacher- 
made tests. To this end, some studies have 
indicated that training in TW skills does 
not critically influence performance on 
standardized achievement tests, but does 
influence performance on multiple-choice 
tests with poorly made distractors, which 
are then argued to be representative of 
teacher-made tests. Thus, Wahlstrom and 
Boersma (1968) have argued that TW 
training increases scores on "poorly made*" 
tests but does not increase scores on 
standardized test items. Although there 
may or may not be some truth to this 
argument, there is a logical flaw in it Those 
who advocate training in TW to improve 
scores on poorly constructed test items are 
in essence arguing that teachers should 
teach their students how to outguess their 
poorly constructed tests. Such an argu- 
ment is not logically sensible, and in addi- 
tion, suggests that outguessing test items 
for which the content is not known would 
result in more, rather than less, measure- 
ment orror. At any rate, the interests of the 
teacher ard students would be better 
served by putting additional time into 
training the teacher to construct better 
items, rather than teaching the students to 
outguess them more efrcctivcly. 



MYni#3: CuLTTmAi-Dii-T>:RKNa:s 
Exist in Ti-:st -Wisknuss 

It has been as.sumcd as far back as the 
'^codification'* of TW in the original article 
by Millman, Bishop, and Ebel (19G5) that 
TW of the type found on objective tests is 
culturally determined. One of the more 
widely cited references to this myth is by 
Millman and Setyadi ( 1966) who compared 
the performance of American and Indone- 
sian students on open-ended and multiple- 
choice questions. The Amerkran students 
eryoyed an advantage on the objective 
questk)ns, even after the Indonesian stu- 
dents were familiarized with the mechan- 
ics of choosing the correct answer. Fur- 
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thcrmore, Lo and Slakter (1973) compared 
Chinese and American students on an 
instrument meant to measure IHV and risk- 
taking in test circumstances. These two 
articles have been commonly cited by 
researchers as evidence that some ethn ic/ 
cultural groups in the United States may 
score lower on achievement tests because 
of "cultural" dllTerences in W. This possi- 
bility has led to much research on training 
American minority groups on TW skills. 
Ollcn. however, deficiencies in TW exhi- 
bited by minority groups have simply been 
assumed rather than documented. Slakter, 
Kohlor, and Hampton (1970) maintain 
**thc objective of [a TW] learning program 
wou Id be not only to decrease the errors of 
nioasuromcnt mentioned by El>cl ( 1 9f)r), p. 
20(5). but to decrease the handicap under 
which many examinees apparently o|K»r- 
ate. For example, certain subseLs of the 
population (black studenLs, rural stu<ients, 
etc.) score lower on achievement tcsLs 
than the population at large* (p. 2i33). The 
assumption by these authors is that much 
of the difference in achievement test scores 
Ls due to cultural influences in TW, and not 
lower levels of achievement in general. 

Evidence presented to support the 
assertion that minority groups lack TW. 
however. Ls often tenuous. For example, 
when Kalachstein. Kalechstein. and Doc- 
tor ( 1981 ) cited Ortar (19f)0). among oth- 
ers, in their statement. **several investiga- 
tors have noted the lack of test- wisen ess in 
cult urally different children" (p. 1 98), they 
implicitly referred to American minorities. 
Ortar actually speaks of the difficulties in 
usingstandardized tests when faced with a 
culturally diverse population, stating that 
under such circumstances, the assump- 
tion of equality of past exi)erience cannot 
be made. It :s not clear that this statement 
is accurate wnen applied to inner city, 
black, or lower socioeconomic status stu- 
dents. 

Mast empirical studies attempting to 
document differences in TW between 
ethnic/cultural groups consist of either (a) 
the administration ofa TW instrument to 
different cultural groups, or (b) attempts 
to evaluate the impact ofTW training on 
the subsequent scores on aTW instrument 
or a real sUndardized test. Despite the 



concern expressed by many researchers 
(eg., Ebel, 1965; Ortar, i960) that score 
differentiab may be related to between- 
group deflcits in TW, relath^ly little re- 
search has focused pn identifying that 
dencit For example, Kalechstein et aL 
(1981) cited previous investigators who 
have described the lack ofTW in culturally 
difTerent/disadvantaged groups, but them- 
selves administered aTW training program 
to a group of black, disadvantaged second 
graders without reference to a supposedly 
"advantaged" group. However, it may be 
that all second graders as a group are rela- 
t'vely inexperienced with tests in general. 
. (le performance of black second graders 
after exfK)Sure to a TW treatment in the 
alisence of comparison to other groups, 
therefore, tells us relatively little concern- 
ing cultural group difTerences in TW. Thus. 
Kalechstein et al. have not established that 
achievement tests are less valid for the 
group they studied. What they have done is 
replicated the study byCalienbach (1970) 
with a difTerent population and raised 
questions not directly addressed in their 
own investigation. Likewise. Dreisbach and 
Keogh ( 1 982 ) successfully trained TWskills 
to Mexican-American children and com- 
mented **test-wLsencss may be particularly 
important when testing children from 
economically disadvantaged backgrounds 
and/or where the primary language of the 
home Ls not standard English" (p. 228). 
Although language of test admin istratk)n 
and language competenceofthe child were 
also investigated, the primary focus of this 
investigation was the hypothesis that 
Mexican-American children *1ack 'test- 
wiseness' and thus do poorly on tests" (p. 
224 ). Differential effects of train ing for low 
SES or minority populations, however, 
were not investigated in their study and 
leave unanswered the issue of whether 
such training is in facf'particularly impor- 
tant" for low SES or minority populations. 

In contrast to the questionable sup- 
port of cultural/minority differences in 
TW. there is evidence that these groups 
differ little with respect to TW. In a disser- 
tation by Yearby ( 1975) in which SES. race, 
and sex were controlled, no significant dif- 
ferences were observed between the groups 
on the test-taking skills pretest Another 
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study which directly addressed the ques- 
tion of whether disadvantaged or minority 
populations lack TW was conducted by 
Diamond, Ayres, Fishman, and Green 
(1976). Although the study was clearly 
designed to indicate relative deficiencies in 
TWon the part ofblack inner-city children, 
support for this hypothesis was not Tound 
It wjts found that black inner-city children 
performed significantly at)ovc chance on a 
TW in^l rument, and that scores on the TW 
instrument did not predict grades on the 
Verbal Achievement subtest of the Califor- 
nia Achievement Test. This suggesus that it 
can neither he assumed thai disadvan- 
taged or minority groups lack TW, nor that, 
a relation between TW and achievement 
lest scores exist^s in these groups. In a 
review by McPhail (1976), it was concluded 
thcit TW studies conducted on black and 
other minority student populations . . . 
have been inconclusive" (p. 168). Although 
it may Ikj argued that direct evaluations of 
relative levels of test-wiseness in minority 
and nonminority groups are lacking, it 
must be maintained that at present the 
iissertion of American minority groups 
being lower in test- wisen ess, and this defi 
ciency being responsible for n^uch of the 
performance duTerences between groups, 
Ls largely unsupported. 

As in most contemporary myths, how- 
ever, a degree of truth can be discerned. 
Although studies which compare the effec- 
tiveness of test-wiseness training l)etween 
minorityandnonminoritygroups have not 
been found, a recent investigation does 
offer some support for the •'cultural differ- 
ence in TW" issue. Through meta-analysis 
procedur(», Scruggs, Bennion, and While 
(in press) have l)een able to make quantit- 
ative comparisons in the erfectiveness of 
TW training on achiev ement test scores of 
minority and nonminority "roups which 
were not directly assessed oy individual 
studies. Scru^ et al. evaluated 24 empiri- 
cal studies which investigated theeffects of 
TWtrainingon elementary schoolstudents, 
grades 1 through 6. It was found that with 
less than 4 hoursof treatment, neither "low 
SES" nor -not low SES" sutjects benefited 
appreciably (average effect sizes of - .05 
and .08). With more than 4 hoursof treat- 
ment, students f'^om low socioeconomic 



background benefited more than twice as 
much as students who were not from low 
SCS backgrounds (average effect sizes of 
.44 vs, .20). Since low SES subjects under 
these circumstances appeared to benefit 
more than twice as much as their counter- 
parts from higher SES groups, the finding 
implies that children from low SES back- 
grounds are somewhat deficient with 
respect, to TW, In addition, most students 
rcpre.seni ing low SES groups in the studies 
evaluated were also members of inner city 
minon'v t;rouns. It must be noted, how- 
ever. Lha. l»,o effect size difierential for a 
student receiving 4 or more hours of 
treatment from low SES and not low SES 
backgrounds was .24 standard deviation 
units, a relativelysmall difference which in 
no way could account for the large per- 
formance differences seen l)etwecn SES 
groups on achievement tests. Although the 
Scruggs et al. (in press) study provides 
some evidence that .students from low SES 
and minority backgrounds may suffer 
.somewhat with respect toTW.skilLs, the.se 
deficiencies explain little of performance 
differences between the two groups. 



Mvni ^4: Ti>rr-Wi.sKNi-;ssI.sEA.siLv 

TUAINKII AND KiCSDLIS IN LAKOK GaINS IN 

Tr:.s-rlM:Hix)HMAN(>: 

This myth is related to the "large 
.source of variance/commonly found" myth 
in which .statistical .significance has been 
confu.se(l with practical importance. For 
example. Sarnacki (1979) referred to 
Gaines and Jongsma as having concluded 
"that lAV can be laught in a relatively short 
amount of time with significantly higher 
performance* on standardized tc^sts result- 
ing.** Slakter goes on to cite several others 
who ".significantly" raised achievement test 
.scores by TW training (e.g, Callenbach. 
1973; Gro.ss, 1970; Wahlstroin & Boersma, 
1908). An analysis of a number of signifi- 
cant versus nonsignificant differences, 
however, says little about the relative size 
of the effect of training. In a recent meta- 
analysis, Bangert-Drowns, Kulik, and Kulik 
(1983) indicated that training in TW 
resulted in average effect sizes on achieve- 
ment test scores of .29. On the primary 
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grade levels, this efTect size would be equi- 
valent to approximately 3 months of aca- 
demic achievement, not a large diflTerence 
by educational standards. In a more recent 
meta-ana^is, however, using somewhat 
different criteria for evaluating effect sizes, 
Scruggs, Bennion, and White (in press) 
determined that the average effect size in 
the elementary grades for raisingscoreson 
achievement tests was .10, less than half of 
that reported by Bangert-Drowns ct aL, 
reflecting grade equivalent increases of 
questionable signiflcancc. It was oniy after 
relatively long-term training (i.e., longer 
than 4 hours) that the resulting effect sizes 
began to resemble those reported by 
Bangcrt-Drowns et aL Thus flnding demon- 
strated by meta-analysis in the elementary 
grade level has recently been demonstrated 
to be true with coltoge-bound students on 
the Scholastic Aptitude Test (DeSimonian 
& Laird, 1983). Thus, it appears that the 
notion tha^TWis easily trained and results 
in substantially higher test scores is un- 
justifled. 

Another argument that TW is easily 
trained comes from researchers who 
trained selected aspects of TW and mea- 
sured performance on tho basis of a TW 
instrument (e.g, Gibb, 19/ >4; bi«kter et al., 
1970; Morcshultz & Baker, 1966). Ic was 
found that TW training does substantially 
and easily increase scores on TW instru- 
ments, and these flndings have oeen sup- 
ported by the mc'a-analysis of Scruggs et 
al. (in press). Although thistypeof training 
does seem to be effective in promoting 
scores on TW tests, the extent to which this 
training raises scores on actual tests 
remains relatively small 

Another argument offered by those 
who maintain TW is ''easily tr.iinod" is that 
although TW cues ve not c .nmon on 
standardized achievement tests, the are 
common on flawed teacher-made tests, 
and it is on these types of tests that TW 
training is most beneflcial. This issue has 
been addressed above. Although it seems 
absurd for teachers to teach their students 
to "outguess** their own poorly constructed 
tests, the idea of training teachers to con- 
struct better test items is often dismissed 
outofhandSamacki(1979) argues uncon- 
vincingly that even if teachers are trained 



in the principles ofTW, item faults may still 
occur. One may Just as easily assert that 
students may forget some of the TW skills 
they were taught In fact, if the same 
amountof time wasspent training teachers 
to construct better test items, it is logical to 
assume that less, rather than more, error 
would result than if students were trained 
to guess correctly the answers to questions 
they do not understand. In summary, it 
can be stated that (a) relatively small gains 
in standardized test performance have 
been achieved only after extensive train- 
ing, and (b) although effects arc greater for 
poorly constructed items, training in this 
ar'^a is more difficult to justify. 

In spite of this present, rather pessim- 
istic appraisal of the ^'easily trained" niytlt, 
however, a positive hypothesis, which has 
only recently received some research sup- 
port, does remain. Although gnmp differ' 
ences with respect to TW training have 
been relatively small, it Ls possible that 
there exist certain indimduaU; (or small 
groups) for whom TW is both necessary 
and beneflcial and for whom relatively 
large differences in performance can be 
achieved. It has been seen that students 
classifled as mildly handicapped (i.e., learn- 
ing disabled and behaviorally disordered) 
may differ from their nonhandicappcd 
peers with respect to (a) attitudes toward 
tests (Scruggs, Mastropieri, Tolfa, & Jen- 
kins, 1985), and (b) spontatu'ous produc- 
tion of effective test-taking strategies, 
including the effective utilization of lest 
format (Scruggs, Bennion, & Lifson, 1985), 
selection of an appropriate test taking 
strategy (Scruggs, Bennion, & Lifson, in 
press), and use of prior or partial knowl- 
edge and deductive reasoning (Scruggs & 
Ufson, 1985). A recent experiment in TW 
t-aining of regular third grade students 
has indicated that TW training bencflted 
the lower half of the class much more so 
than the upper half (Scruggs, Bennion, & 
Williams, 1984). Such differences were 
seen to Vash out** when scores of the 
trained group as a whole were combined. 
Finally, successful training of test-taking 
skills has recently b .en achieved in special 
education populat ions (Dunn, 1981; Lee 
& Alley, 1981; Scruggs & Tolfa, 1985; 
Scruggs & Mastropieri, in press). The 
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obtained effect sizes in these initial investi- 
gations have tended to be somewhat larger 
than those obtained on nondisabled popu- 
lations, and there is the added feature that 
many of these students are functioning 
within a level at which relatively slight 
changes for better or worse on achieve- 
ment test performance may result in mc re 
serious decisions regarding educatioral 
placement In other words, although gains 
have typically been small and oflcssconsc- 
quences for normally achieving students, 
even relatively small gains may be of 
greater importance to students function- 
ing at the lower end of the distribution. 
Also, mildly handicapped groups do in fact 
exhil)it less cfTicicnt test-taking strategics 
than their nonhandicappcd peers, and it 
would seem logical to assert that these 
students should be trained to utilize the 
same strategies that othc students are 
spontaneously using. 

Summary ano Conch isions 

The present view has attempted to 
evaluate critically four contemporary 
myths associated with test-wiseness. In 
this article, we have stated tiiat ^a) the 
disassociation of TW from general cogni- 
tive ability has not been verified, (b) TW 
has not been shown to constitute a large^ 
source oferror variance in tests, (c) Amer- 
ican minority groups have not been shown 
to be seriously lacking in TW. and (d) rela- 
tively modest improvement in test scores 
has been achieved only through long and 
intensive training in TW skills. Stated more 
positively, TW can be said to be a tangible 
component of the test-taking experience, 
but one which nevertheless plays a rela- 
tively minor role in overall test scores for 
most students. 

Several implications can be drawn 
from this analysis for the practicing school 
psychologist First, in many individual 
cases, it may be wiser to assume TW has 
played a relatively minor role in test per- 
formance. Although teachers often explain 
a particular student's poor test scores by 
asserting he/she is itmply a poor *test- 
taker,** such reports may reflect either a 
well-intentioned but misguided sympathy 



for the student, or simply a misreading of 
the student's actual abilities. A psycholo- 
gist who has been told that a particular 
child's low scores reflect only poor test- 
taking skills would be well advised to seek 
more tangible evidence that this is truly the 
case. Second, if it can be demonstrated 
that a given student Ls exceptionally weak 
in TW, there is little reason to believe that 
the student could not be trained in TW 
skills. Finally, in the case of .special educa- 
tion students, it may be advisable to ensure 
that all such students have had .some addi- 
rional guided practice on unfamiliar test 
formats. 

It can l>e concluded that Jthongh W 
as a construct is weaker and less pervasive 
than commonly assumed, there Is never- 
theless tangible evidence of its (perhaps 
multifaceted) existence and .some indica- 
tion that, although large groups tend to 
gain little from specific training in TW, 
there may be certain individuals or smaller 
groups for whom the construct oCVW does 
constitute an "important .source oferror " 
Further research in this area may do much 
ultimately to clarify the issue of tcst- 
wiseness. 
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Academic and Intellectual 
Characteristics of Behaviorally 
Disordered Children and Youth 

Marge A. Mastropieri. Vesna Jenkins, and Thomas E. Scruggs 

ABSTRACT 

Research describing academic sndinteUectual characteristics of behavior- 
ally disordered students iS reviewed, investigations reviewed m this paper 
h^ ** focused On areas of mteUectuat, academic, and psychosociaf function- 
iny as they pertain to school achievement, in generai, it has been found tliat 
behaviorafiy disordered students exhibit academic deficiencies greater 
tlian those exhibited on tests of inteliectual functioning and perform beiow 
average in aii content areas, with particular discrepancies noted in math 
functioning in addition, variables such as focus of controi, responses to the 
test-taf(ing Situation, and attitudes toward academic tasks, may covary with 
academic performarKe. 

All Students classified as behavtoratiy disordered by definition arc in need of 
prog* ming designed to improve social or emotional functioning. Since most of 
this programing occurs in academic environments, however, it is important to 
know whether students so classified also exhibit deficiencies with respect to 
intellectual or academic functioning. If behaviorally disordered students are 
generally found to be deficient in academic functioning, it may be necessary to 
incorporate remedial instruction as a ma|Or component of the educational 
environment This review is intended to synthesize academic and intellectual 
charactenstics of behaviorally disordered children and youth in order to pro- 
vide a basis for future research and practice 

Two databases (Psychological Abstracts, ERIC) were examined for data- 
based articles pertaining to academic and intellectual characteristics of behav- 
iorally disordered students. In addition, recent books on behavioral disorders 
(e g.. Kauffman. 1985) were reviewed for sources. Finally, past issues of the 
journal Behavioral Disorders, and the monograph series. Severe Behavior 
Disora^irs of Children and Youth, were examined for relevant articles Articles 
were included which selected a population on the basis of disturbances in 
social or emotional functioning, exclusive of psychotic or autistic samples By 
these means. 25 articles reporting data were located and are given in Table 1 

The investigations reviewed here represent a wide range of samples of 
children and youths referred to as behaviorally disordered. To this extent, any 
general agreement between investigations suggests broad generalizability 
When research reports disagree, however, interpretations are more difficult In 
general, descriptions of academic and intellectual characteristics can be 
divided into three main areas: (a) intelligence, (b) achievement, and (c) psycho* 

Prepdration of mt$ manuscript was supported m part by a grant from the Department of Education. 
Special Education Programs. No G008300008 The authors would like to thank Ursula Pimentel for 
her assistance m the preparation of this manuscript Address requests for reprints to the first author 
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social functioning and academic performance. 

IffTELUGENCE 

Studie:; of intellectual functioning are of releva ce to the study of academic 
charac teristics for two reasons: (a) IQ consistently has been a strong predictor 
of academic achievement (Kauffman. 1985): and (b) IQ scores can provide 
information concerning ability/achievement discrepancies. The following sec- 
tion describes the results of several investigations of Intellectual performance. 

In 1964 Stone and Rowley reportedaneanlQof96.5 (ranging from62 to 135) 
for 116 children referred for psychiatric services. Graubard (1964) found 21 
delinquent or neglected boys in psychiatric residential treatment for2 to 8 years 
to have a mean IQ of 92.3 (range 71 to 108). Schroeder (1965) reported that for 
106 students classified as psychosomatic, aggressive, exhibiting school diffi- 
culties, school phobic, or neurotic, the average IQ was95.95. Motto and Lathan 
(1966) studied 47 schoolage children in a state hospital and reported that, as a 
group, they were in the dull normal range of general intelligence. Galvin. Quay, 
and Werry (1971) reported IQ ranges of 89 to 112 for 11 conduct problem 
children placed in special classrooms. Fuller and Goh (1981) examined 38 
learning disabled and 42 emotionally disturbed public school children and 
reported tower average IQ scores for the learning disabled than for the emo- 
tionally disturbed students (86 13 and 89.50. respectively). As recently as 1983 
Forness. Bennett . and lose reported that 92 subjects (23 girls and 69 boys) who 
had been inpatients at a neuropsychiatnc institute had. on the average. IQ 
scores in the low 90s 

Reilly. Ross, and Bullock (1980) examined the intellectual performance of 
1 77 adjudicated adolescents and reported a mean IQ score of 90.26. a figure 
consistent with that of a previous investigation (Bullock & Reilly. 1979) In 
addition, these researchers reported that subjects scored near average on the 
Picture Arrangement subtest of the Wechsler Intelligence Scale for Children - 
Revised (WtSC-R) which requires visual sequencing of simple stories, but 
lowest on those verbal subtests which require knowledge of the "outside 
world"* Information, Similanlies. Vocabulary. Finally, a relation between IQ 
performance and violent behavior was not found in this investigation 

Research on intellectual performance of disturbed children reveals that the 
majority of mildly and moderately disturbed children fall only slightly below 
average in IQ These investigations taken together appear to suggest that mild 
academic deficiencies could be predicted on the basis of observed intellectual 
functioning Scruggs and Mastropieri (1984) pointed out that IQ scores m 
combination with achievement test scores can provide information regarding 
relative discrepancies between ability and academic performance of the behav- 
lorally disordered population. What IQ scores cannot do is describe behavior- 
ally disordered students* actual levels of academic performance Kauffman 
(1985). however, does maintain that IQs of disturbed children are the best 
predictors of future educational achievement. The following section describes 
investigations of academic functioning. 

ACHIEVEMENT 

Reading and Arithmetic 

Silberberg and Silberberg (1971) reviewed research on school achievement 
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TABLE 1 

Academic Characteristics Studies of the Behavioraily Disordered 



Author! 



Subjects 



Talk 



RtsutU 



Bullock &Reilly (1979) 



Epstein & Cullman (1983) 



Forness. Bennett. & Tose 
(1983) 



Forness & Dvorak (1982) 



Forness. Frankel. 
Caldron & Carter (1979) 



188 adolescents adjudicated for 
behavioral offenses 



16 nriatched pairs (IQ. sex, CA. 
ethnicity); LD& BD. public school 
students 



23 girls, and 69 boys who had 
been patients at a neuro- 
psychiatric institute; 
mean age 10 1 years 



40 BD adolescents (15 males. 25 
females) who had been inpatients 
at a neuropsychiatric institute, 
mean age 15 7 years 

34 children (CA 7 0 to 129) 
hospitalized for severe behavior 
disorders 



Wechsier Intelligence Scale, Wide 
Range Achievement Test (WRAT) 



Peabody Individual Achievement 
Test (PIAT) and Wide Range 
Achievement Test (WRAT) were 
administe . o both groups 

Peabody individual Achievement 
Test (PIAT) and Wechsier 
Intelligence Scale for Children- 
Revised (WiSC-R) were 
administered to ail students 



1 Average 10 of 90 

2. Average achievement deficit in ail artas 

3. Discrepancies were greatest for males, 
nnnorities. older students 



Comprehensive Test of Basic Skills 
(CTBS) was administered and 
scored under timed and untimed 
testing conditions 

Peabody Individual Achievement 
Test (PIAT) 



BD students scored significantly hightr 
than LD students on all subjects except 
general information subtest of PIAT and 
math subtest of WRAT 

Both girls and boys scored betow expectad 
levels on PIAT (moderately) 
Both girls and boys IQ in Jow 90s ' 
12 yr old boys worse in reading 
recognition and reading comprehension 
10 yr Old girls 2.1 yrs below grade level 
12 yr old girls 17 yrs below grade level 

No Significant test score differences except 
on the reading comprehension subtest 



Students were deficient in alt academic 
areas particularly math and spelling 
Longer hospitalization periods were 
associated with greater academic gains 
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Fuller &Qoh (1961) 



36 LD and 42 ED children, public 
school setting; mean age 10 years 



Wechsler Intelligence Scale for l 
Children-Revised (WISC-R). Wide 
Range Achievement Test (WRAT). 
and Minnesota Percepto*Diagnostic 2. 
Test (MPD) were administered to all 
students 



Discriminant analysis procedures indictttd 
that LD St* 'dents and ED students coutd b% 
accurately placed 

LDs lower than EDs on 10. reading, 
spelling, and math but not on MPD 
(however, no statistical tests computed on 
results) 



Glavin & Annesley (1966) 



130 BD boys and 90 normal boys 
in public school settings (BD 
further divided into conduct 
problem, withdrawn, and 
inadequacy-'mmaturity groups) 



California Achievement Test (CAT) 
and Behavioral Scales (Quay & 
Peterson 67) 



61 5% of the BD group were 

underachieving in readirtg 

72 3% of the BO group were 

underachieving in arithmetic 

No significant differences in perlormence 

were found between the conduct 

disordered group and the withdrawn group 



Glavin & DeGirolamo 
(1966) 



9 ED and 9 regular education 
students; public school setting 
15 ED students classified as 
either conduct disordered or 
withdrawn, and regular ED 
students 



Spelling words from GATE'S A Ust 
of Spelling Difficulties in 3876 words 
(1937) were administered to both 
groups 



1 ED students made more "internar erfon 
and (ewer "externar errors than regular 
students 

2 Withdrawn students wrote significantly 
more unrecognizable words 

3 Conduct disordered students made 
significantly more **refusar errors 
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TABLE 1 

Academic Charactensucs Studies of the Behavioraily Disordered 



Authors 



Subjects 



Task 



Results 



Glavin. Quay. & Werry 
(1971) 



Graubard (1971) 



Graubard (1965) 



Conduct probirm children placed 1967, Wide Range Achievement 
in experimental special classrooms. Test (WRAT). i968. California 



50% Afro-American. tOs 89-112 
1967, N-yy, mean age 108 months 
(age range 92-132). 1968, N=12. 
mean age 112 months (age range 
89-131); both years. /V«8 



108 disturbed students m special 
schools 



35 disturbed delinquents 
incarcerated at residential 
treatment center, age range 
8 years 6 months to 10 years 
1 1 months 



Achievemeru Test (CAT) 
pre- and post 



Reading Achievement. Behavior 
Problem Checklist 



Wechsler Intelligence Scale for 
Children (WISC), Metropolitan 
Achievement Test (MAT), Illinois 
Test of Psycholinquistic Abilities 
(ITPA), Monroe Test of Auditory 
Blending (MTAB), and Harris Test 
of Lateral Dominance (HTLD) 



1 1968 arithmetic gam 1.7 years 

2 1967 arithmetic gam 1 years 

3 1968 reading gain 1 2 years 

4 1967 reading gam .5 years 

5 1968 greater emphasis on academic 
achievement 

6 Gam indicates program brings changes in 
specific learning-related behavior ar)d 
obtains concomitant gains in academic 
achievement 

1 No Overall reading deficiency 

2 Observed deficiencies associated with 
severity of conduct disorder 

1 BD students did not differ from normals in 
communication pattern 

2 BD students have deficits in the visual- 
motor channel (the integration level) 

3 BD students have deficits in the Auditory 
Vocal Automatic modality and in 
directionality 




Graubard(l964) 



Harris & King (1982) 



21 Children in psychiatric 
residential i.eatnr^ent from 2*8 
years (delinquent or neglected), 
mean age 13 years 10 months 
(range 10-16). mean grade 7 9 
(range 5*11. mean 10 92 3 
(range 7l*l08), all boys 



Wechsler Intelligence Scale for 
Children (WISC). Metropolitan 
Achievement Test. Stanford 
Achievement Test 



242 Children in grades 4 and 5 m 
public school settings, students 
were classified as LP (learning 
problem N=33), BP (behavior 
problem N-M), LBP (learning & 
behavior problem N=19) or NP (no 
problem N=173) 



Science Research Associates 
Achievement Tests (SRA). 
Children's Personality 
Questionnaire (CPO). L-J 
Soctometnc Test (L-JST) 



Difference between reading ar.d math not 
significant; mean grade rating both tests 
4 75. mean grade reading comprehension 
4 87. mean grade arithmetic computation 

4 62 

Educational disability measured by 
comparing mental age to reading and 

arithmetic ages. Severe reading and 

arithmetic disablility found 

Not achieving commensurate with mental 

ages and disabled in academic 

achievement 

No evidence supporting significant 
difference between reading and arithmetic 
achievement in population with severe 
emotional problems over time 

LP students achieved lower scores on 

SRA. were less preferred by peers, were 

less intelligent than NP and less asseftiva 

than BP and LBP groups 

BP did not differ from NP or SRA subtests: 

Reading. Math. Science, Use of Sources, 

but did differ from all groups on loinguage 

Arts and Social Studies 

BP did not differ from any group 

sociometncally 

LBP did perform lower than all groups on 
SRA. were preferred less by aM groups 
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Author* 



TABLE 1 

Academic Characteristics Studies of the Behaviorally Disordered 



Subjects 



Task 



Rttiritt 



Hisama (1976) 



Utteri (1979) 



Motto &Ldthan (1966) 



48 special ed children with learning Children's Locus of Control Scale 
and behavior problems: mean CA (CLCS). Coding Test and Digit 
1U8 months (ranges 96-132). public Symbol Test froin WISC. Wechsler 
schools; 3rd or 4th graders. Adutt Intelligence Scale (WAtS). 

46 norma) 3rd or 4th graders; free NIM game (match game) 
from learning and behavior 
problems randomly selected, mean 
CA 106 months (ranges 90-136) 



200 subjects (some BD some not) Cognitive Profile 



School*age population of state 
hospital. 34 boys, mean age 13 
years 1 mo (range 10*2 to 16-9). 
13 girls, mean age 1 1 years 2 mo 
(range 9*3 to 15*1). as aroup, 
in dull norma) of general 
tnteihgence 



Wechf ler intelligence Scale for 
Children (WISC). Wechsier Adull 
Intelligence Scale (WAIS). Stanford* 
Bmet. Form L. Cahtorma 
Achievement Test (CAT), reading 
and arithmetic 



No Significant difference in CLCS scores 
between normals and LO and \ BO not 
externalV oriented 
Coding Test showed children with 
internality performed better than those 
with externality 

Within expenmental group* externally* 
onented child responded to success 
experience positively and performance 
depressed under failure condition 

Cognitive profile associated with low 
academic achievement and severe 
behavior problems is' simple, leveler. 
intolerant for ambiguous information, 
global, broad, non-focuser. ar>d impulsive 

Uniformity of achievement in reading and 
arithmetic - not significantly different 
Females. CA 1 4 below expectations in 
reading. CA 1 6 below expectations in 
arithmetic. MA .7 below expectancy in 
reaGHi<^, and 9 below in arithnietic 
Males. CA 2 6 below reading expectancy; 
CA 3.7 below expectancy in arithmetic; MA 
1 8 below reading, and 1.9 below 
arithmetic 
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Ptma, Ounlap, & Dillard 
(1964) 



Railly, Rots. 6i 
Bullock (1960) 



S3 males classified as mildly to 
moderately ED in public schools, 
age range 10-15 years (mean age 
12.9 years) 



177 adolescents adjudicated for 
specific behavioral offenses 



Intellectual Achievement 
Responsibility (t AR), Chronological 
age. Stanford-Binet 10 (S-BtO) or 
WISC-R, California Achievement 
Test (CAT) 

Wechsler Intelligence Scale for 
children (WiSC-R) Wide Range 
Achievement Test (WRAT) 



4. More pronounced retardation In males 

5. Children in hospital school in excess of 10 
months gained in reading and arithmetic 
achievement to extent expected for their 
mental ages 

1. ED students who felt a high degree of self* 
responsibility for their successes and 
failures showed greater academic gains 



1. Average WiSC-R 10 of 90.26. Near average 
scores on Picture Arrangement; Icwett 
scores on Information. Compreheneldn. 
Vocabulary 

2 Average achievement was deficient In all 
areas. Arithmetic scores were consistently 
lower than reading; violent v "'enders had 
the lowest reading scores 

3. A relation between 10 and violent behavior 
was not found 
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TABLE 1 

Academic Characteristics Studies of the Behavioraiiy Disordered 



Authon 



Subjtctt 



Task 



Rttuitt 



Schroader (1965) 



Scruggs & Mastropien 
(In prtss) 



Scruggs & Mastropien 
(1964) 



Scruggs, Mastroptcn. 
& Tolfa (1985) 



106 stuoents Classified in one of 
five categories (psychosomatic, 
aggressive, school difficulties, 
school phobia, neurotic-psychotic 
personalities): mean age 147.06 
months 



50 BO and 28 LO Students tn 
grades 3-4 



1480 LO and BO students m 
grades i-3 



4> LO and 44 BO students m 
grades 4-6 



Wechsler Intelligence Scale for 
Children (WiSC), Jastak Wide 
Range Achievement Arithmetic, 
Jastak Wide Range Achievement 
Reading (WRAT) 



Tratntng test-taktng skills relevant 
to the Stanford Achievement Test 
(SAT), reading subtests 

Stanford Achievement Test, 
all subtests 



Training test-taking skills relevant 
to the SAT reading, and math 
subtests 



Mean scores consistently lower in 
arithmetic than reading in all five 
categories 

School difficulties category lowest mean 
achievement level in arithmetic and 
reading 

Highest grade equivalent composite metn 

in neurotic-psychotic category 
Emotionally disturbed children were 
retarded from age level in achod 
achievement 

Educational disabilities cortcomitant with 
emotional disturbance and vice versa 

BO and LO students exhioited deficiencies 
on the SAT reading subtests. Test scores 
improved significantly wi^^ training 

Only .flight differences between LO and 
BO groups, with lD students consistently 
higher m achievement 
Factor score patterns of LO and BO 
students were equivalent 

Trained LO and BO students gained on the 
reading decoding subtest relative to 

controls 
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Scruggs, Mastropien. 
Tolfa. & Jenkins (1985) 



37 BD students and 50 
nonhandicapped students, 
grades 5*6 



Test Attitude Scale (TAS) 



Stone & Rowley 
(19-4) 



82 boys and 34 girls, mean age 12 
years, mean IQ 96 52 grange 
62-135) 



Wide Range Achievement Test 
(WRAT), anthimetic and re?-'?«ng 
parts. Wechsler Intelligence Scale 
for Children (WISC) 



Differential gam on the part of trained BO 
students over trained LO students on 
"math concepts" subtest 

BD and r.'n handicapped students did not 
differ at the beginning of \hn school yMr 
After three days of testing. 80 students 
reported lower attitudes m personal 
feelings and personal importance of tests, 
but did not differ with respect to attitudes 
concerning fairness of tests 

In reading a." anthmetic, majority of 
Children fell bi low level of achieven^ent 
expected o.^ basis of chronological ige 
In using mental ages as basis for 
determining achievement :evel. rriajority fell 
below expected level in both reading and 
arithmetic 

Emotionally disturbed children lower In 
arithmetic scores than reading scores 

(significantly) 

In actual grade placement, larger 
proportion were In grades below that 
expected on basis of CA 
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TABLE 1 

Academic Characteristics Studies of frte Behavioratty Disordered 



Authors 



Subjects 



Task 



Results 



Tamkin (19(}0) 



Chiidren receiving residential 
treatment for emotional disorders 
>n psychiatric hospital. 22 boys, 
mean age 8 7 years. K' girls, 
mean age 9 4 years, combined 
mean age 9 0 years 



Wide Range Achievement Test 
(WRAT) arithmetic and reading 
parts 



Both arithmetic and reading grade rating 
within range commensurate with mean CA of 
sample 

Difference between grade ratings for reading 
and arithmetic was significant at .005 point 
based upon one-tailed test (r«2.91) 
32% (n-11) demonstrated soa>e degree of 
educational disablitity 41% (n«14) were 
educationally advanced, and remaining 27% 
(n-9) were at expected grade level • observing 
difference between CA and grade rating 
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and delinquency. They cited early $tudies by Sullivan (1927). Lane and Witty 
(1934). Hill (1935). and Bond and Fendrick (1936) who found that in general 
delinquents were deficient in reading achievement. 

Tamkin (1960). whose Subjects Included 34 children receiving residential 
treatment for emotional disorders, reported both the arithmetic and reading 
grade rating to be within the range commensurate with the mean chronological 
age of the sample. Arithmetic achievement was significantly lower than read- 
ing Data from the Wide Range Acf^ievement Test (WRAT) showed that 32% 
demonstrated some degree of educational disability. 41% were educationally 
advanced, and the remaining 27% were at expected grade level. 

Stone and Rowley (1964) tested 1 16 children referred for psychiatric services 
using the WRAT The majority of children fell below the expected level of 
achievement in reading and anthmetic on the basis of both chronological and 
mental ages. These children also scored significantly lower m anthmetic ihan 
reading In actual grade placement, a larger proportion were in grades below 
those expected on the basis of chronological age. Likewise. Reilly. Ross, and 
Bullock (1980) reported that academic performance was deficient in all areas 
with arithmetic scores consistently lower than reading In addition, Reilly et al 
(1980) reported that violent offenders had the lowest reading scores In a 
related investigation. Bullock and Reilly (1979) reported lower achievement in 
all content areas on a si' nilar sample of youthful offenders Additionally, great- 
est achievement deficiencies were found for male, minority, and older subjects 

Graubard (1964) compared the performance of 21 children in a psychiatnc 
residential treatment center. Using the Metropolitan Achievement Test and the 
Stanford Achievement Test, he reported severe reading and anthmetic disabil- 
ity by comparing mental age to expected reading and arithmetic achievement. 
No evidence supporting a significant difference between reading and arith- 
metic achievement was found. 

Schroeder(1965) compared the WRAT scores of 106 students classified as 
having emotional problems (psychosomatic, aggressive, school difficulties, 
school phobia, or neurotic personalities). The mean scores were consistently 
lowc m arithmetic than reading in all five categories The school difficulties 
category included the lowest mean achievement level in anthmetic and read- 
ing. The highest grade equivalent composite mean was reported in the 
neurotic-psychotic category. Emotionally disturbed children were deficient at 
all age levels with respect to school achievement Schroeder concluded that 
academic disabilities are concomitant with emotional disturbance and vice 
versa 

Glavin and A. nesley (1966) administered the California Achievement Test to 
90 normal boys and 130 behaviorally disturbed boys (who were further divided 
into conduct problem, withdrawn, and inadequacy-immatunty groups) in pub- 
lic school. Their findings showed that 81 5% of the behaviorally disordered 
group were underachieving in reading and 72 3% underachieving in anthmetic 
Academic failure can be expected in a high proportion of delinquent or conduct 
disordered children according to the review of Silberberg and Silberberg 
(1971): Glavin and Annesley (1966) found no significant differences in perfor- 
mance between the conduct disordered and the withdrawn group. 

Motto and Lathan (1966) found no significant difference in the uniformity of 
achievement in reading and arithmetic of 47 schoolage children from a state 
hospital. The children were below expectations based upon chronological and 
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mental ages. However, they did find more pronounced retardation in males. 

Forness, Bennett, and lose (1983) found similar results comparing 92 chil- 
dren who had been inpatients at a neuropsychiatric institute. Both boys and 
girls scored below expected levels on the Peabody Individual Achievement 
Test, although 1 2*year-oid boys were lowest in reading recognition and reading 
comprehension. In a similar investigation (Forness, Frankel. Caldon. & Carter. 
1980). 34 hospitalized patients exhibited deficiencies in all academic areas, 
particularly math and spelling. 

Fuller and Qoh (1981) compared 38 learning disabled and 42 emotionally 
disturbed public scho^/. . .: '*^a The Wide Range Achievement Test scores of 
the learning disabled children were lower than those of the emotionally dis* 
turbed children on reading, spelling, and math. This was not so. however, on the 
Minnesota Percepto-Oiagnostic Test, although no statistical tests were com- 
puted on the results. 

Harns and King (1982) compared academic achievement of children classi- 
fied as having learning problems, behavior problems, learning and behavior 
problems. or"no problems'*. They studied scores of 242 public school children 
administered the Science Research Associates (SRA) Achievement Tests 
Those children with learning problems scored lower thar. the children with no 
problems. Those with behavior problems did not differ ifom the no-problem 
category on th*» SRA subtests of Reading, Math. Science, and Use of Sources, 
but did differ from all groups on Language Arts and Social Stud:es. The learning 
and behavior problem group performed lower than all groups on the SRA. 

Epstein ait'j Cullt nan (1983) also found that for IS matched pairs (IQ. sex. 
chronological age. ethnicity) of learning disabled and behaviorally disordered 
public school students, the t>ehaviorally disordered students scored signifi- 
cantly higher than the learning disabled students on aM subjects except the 
general information subtest of the Peabody Individual Achievement Test (cf . 
Reilly. Ross. & Bullock. 1980) and the math subtest of the Wide Range 
Achievement Test. These researchers suggested that differential academic 
programing may be indicated for learning disabled and behaviorally disordered 
children 

In contrast. Scruggs and Mastropien (1984) investigated the Stanford 
Achievement Test scores of K80 pnmary grade special education students 
(619 learning disabled and 863. behaviorally disordered) m several different 
content areas. They concluded that the learning disabled and behaviorally 
disordered children were, in fact» very similar with respect to academic perfor- 
mance, with learning disabled children sconng slightly but consistently higher 
than behaviorally disordered children. No consistent reading-math discre* 
pancy was noted in either population. Also found was the fact that the variability 
ol behaviorally disordered student performance descnptively exceeded that of 
learning disabled students: thus, a wider range of academic ach <3vement 
among behaviorally disordered students may be expected. 

In contrast to the above studies, one investigation reported results which 
suggested that behaviorally disordered students do not exhibit academic defi- 
ciencies. Graubard (1971) examined the reading achievement and behavior 
checklist scores of 108 emotionally disturbed children and concluded, "all 
groups' reading commensurate with MA and several groups' reading commen- 
surate with CA" (p. 757). Graubard added, however, that academic retardation 
in his sample was associated with seventy of conduct disorders Unfortunately. 
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no data were offered to support these conclusions. 
Spelling 

Few studies in subjects other than reading and arithmetic fiave been con- 
ducted. Glavin and DeGirolamo (1966) found differences between withdrawn 
and conduct disordered students with respect to types of spelling errors. The 
withdrawn children made significantly more written spelling errors, while the 
conduct problem children made significantly more refusals (i.e.. refused to 
complete the task) They concluded that children with emotional problems may 
Show patterns of spelling errors which differ both quantitatively and qualita- 
tively from those of normal children. In addition, as mentioned above. Fuller and 
Goh (1981) found that learning disabled students scored lower than emotion- 
ally disturbed students on tests of spelling achievement. 

Psychosocial Functioning and Academic Performance 

The present revie of previous investigations can offer little evidence that the 
reported academic deficiencies of behaviorally disordered children are content 
specific; that is. research findings tend to support the notion that behaviorally 
disordered students are deficient in all areas of academic functioning, with 
some individual investigations reporting more serious deficits in math. Research 
which has examined academic performance in several different areas within 
one investigation has supported this conclusion (e.g.. Scruggs & Mastropien. 
1984). However, several other researchers have investigated the interaction of 
academic performance and measures of psychosocial functioning. One major 
purpose of these investigations, described below, is to identify possible causal 
explanations for academic deficits. 

Glueck and Glueck (1950) reported that delinquents exhibited more dislike 
for school subjects requiring stnct logical reasoning and persistency of effort as 
well as those dependent upon efficient memory skills. This finding may partially 
explain some of the previous reports of differentially low performance in math 
School achievement of the delinquent students was far below that of non- 
delinquents. 

Graubard (1965) found that 35 delinquents incarcerated a residential 
treatment center had similar communication patterns to those of nonadjudi- 
cated adolescents. The author maintained, hov^ever. that deficits were exhi- 
bited in the visual-motor channel (integration level). Delinquents also were 
reported to exhibit deficits in the Auditory Vocal Automatic modality and in 
directionality. Findings reported in this investigation, however, may be compli- 
cated by reliability and validity limitations of the measures administered (i e.. 
Illinois Test of Psycholinguistic Ability. Harris Test of Lateral Dominance) 

Two investigations examined locus of control and academic achievement 
with behaviorally disordered students. Hisama (1976) compared 48 special 
education students with learning artd behavior problems to 48 nonhandi- 
capped students on a locus of control measure. It was hypothesized that 
externality may be a factor for low achievement motivation of behaviorally 
disordered and learning disabled children. Hisama reported that the Children's 
Locus of Control Scale showed no difference in scores between normals and 
learning disabled and behaviorally disordered students. It was concluded that 
the Child with learning and behavior problems may not be more externally 
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oriented than the normal child. Pema, Ounlap. and Oillard (1984) found in a 
similar study that of 63 males classified as mildly to moowrately emotionally 
disturt>ed. those students who felt a high degree of self-responsibility for their 
successes and failures (internality) showed greater academic gains. 

Letteri (1979) provided a **Cognitive Profile** associated with low academic 
achievement and severe behavior problems as a result of research efforts with 
200 subjects (some behaviorally disordered, some not). The cognitive pro- 
cesses associated with low achievement were said to include: simple (vs. 
cognitive complexity), leveler (vs. sharpener), intolerant for ambiguous infor- 
mation, global or field dependent (vs. analytical way of perceiving), broad (vs. 
narrow inconclusivenessin breadth of categorization), nonfocuser. and impul- 
sive (vs. reflective). 

Four recent studies investigated attitudes and responses to achievement 
tests themselves. Scruggs. Maslropieh, Tolfa. and Jenkins (1985) examined 
attitudes expressed by behaviorally disordered students toward me test-taking 
exnerience When surveys were administered at the beginning of the school 

.ar. reported attitudes of behaviorally disordered and more average students 
were very similar. When administered immediately after 3 days of testing, 
however, behaviorally disordered students reported more negative attitudes 
than their regular class counterparts. Taking a different perspective. Forness 
and Dvorak ( 1 982) examined the general question of academic performance of 
disturbed or behaviorally disordered students under different testing condi- 
tions The Comprehensive Test of Basic Skills under untimed conditions was 
used to test 40 adolescents who had been inpatienis at a neuropsychiatnc 
institute. Their scores were compared with scores obtained at the end of the 
normai time limits of the test The only performance to increase under untimed 
conditions was that of reading comprehension. Similarly. Scruggs and Mastro- 
pieri (in press) trained a sample of mildly handicapped students, mostly t>ehav- 
lorally disordered, on test-taking skills and reported a significant performance 
advantage on reading subtests. This finding suggests that behaviorally disor- 
dered students may be deficient with respect to test-taking skills. In a more 
recent study. Scruggs. Mastropieri. and Tolfa (1985) reported that test-taking 
skills training of behaviorally disordered students had differentially raised 
scores on a "math concepts" subtest over those of learning disabled students to 
the extent that trained behaviorally disord^*'ed students gained 16 percentile 
points over their untrained counterparts This finding may help explain why 
behaviorally disordered students' achievement scores in math are often differ- 
entially low 

CONCLUSIONS 

The investigations reviewed in this paper represent a wide range of populations, 
ail considered in some way behaviorally disordered. Different assessment 
measures have been used in a wide variety of different settings. In spite of the 
diversity of methods, measures, and population sam^'ites. however, some broad 
conclusions can be drawn and are given below. 

First, behaviorally disordered students consistently have been seen to exhibit 
academic and intellectual deficiencies. Althoug*^ -everal investigations have 
examined the possibility of specific content area deficiencies, all evidence to 
date indicates that academic deficiercies exhibited by this population are 
global, with a smaller set of investigators suggesting arithmetic performance 
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may be relatively lower than reading, in addition, deficiencies in academic 
areas have typically been greater than intellectual deficiencies. Investigators 
who examined ability/performance discrepancies in behaviorally disordered 
children have indicated that academic achievement is generally below levels 
predicted by ability tests. These consistent results suggest that the need for 
academic remediation in this population is as great as the need for behavior 
management and social skills training. 

Whether the reported academic deficiencies of t>ehaviorally disordered stu- 
dents are greater than those typically exhibited by learning disabled students is 
less certain. Fuller and Goh (1981) and Epstein and Cullman (1983) reported 
that learning disabled students scored lower on achievement measures, while 
Scruggs and Mastropieri (1984) reported that learning disabled students 
scored consistently higher. In spl^c- oi these discrepant findings, however, 
substantial academic deficienciei have been reported in both populations In 
addition, behaviorally disabled students have exnibited consistently higher 
variability, due no doubt to the fact that learning disabled students are operat- 
ing under an academic **cut off" level, while behaviorally disordered students 
are not 

Inaddition. several variables have been identified which may partially explain 
observed academic deficiencies. These potentially related variables include 
attitude toward school subjects (Silberberg & Silberberg. 1971 ). external locus 
of control (Hisama. 1976; Perna. Ounlap. & Oillard. 1984). impulsivity (Letteri. 
1979). and responses to test-taking situations (Forness & Dvorak. 1982. 
Scruggs & Mastropieri. in press: Scruggs. Mastropieri. & Tolfa. 1985: Scruggs. 
Mastropien. Tolfa. & Jenkins. 1985). Many of these investigations simply des- 
cribe characteristics of this population, however, and do not provide informa- 
tion that these vanables are. in fact, causally related. Further research is needed 
to document more carefully the reasons for the ot>served academic deficiencies 

Finally, it must be noted that research concerned with optimal instructional 
strategies for this populaVon has been greatly neglected, given the nature and 
extent of the problem. Epstein. Cullinan. and Rose ( 1 980) referred to academic 
remediation of behaviorally disordered students as an area "of great concern to 
special education practitioners, but. ironically, of less concern to researchers " 
(p 64). They descrit>ed the several investigations which had been conducted, 
virtually all of which examined the role of token reinforcement in increasing 
academic performance. Although some initial research has been conducted 
which appears promising in evalgating the effect of such other instructional 
vanables as corrective feedback (e.g.. Polsgrove. Reith. Fnend. & Cohen. 1 980). 
increased instructional time (e.g.. Reith. Polsgrove. Semmel. & Cohen. 1980). 
self-management (e.g.. Cohen. Polsgrove. & Reith. 1980). peer tutoring 
(Scruggs. Mastropieri. & Richter. 1985). and cooperative versus competitive 
learning (Scruggs & Mastropieri. 1985). further research is needed to refine 
these variables and to identify other vanables effective in remediating the 
serious academic deficits of this population 
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Academic Characteristics of 
Behaviorally Disordered and 
Learning Disabled Students 

Thomas E Scruggs and Margo A Mastropiert 

ABSTRACT 

T he ,ic*idctnic pcrfornuvKC of t .430 tx}luivtOttiUYdisordciC(1 mtcHCArnuig dtsablcd 
cluidtctt tVtcndtug grades 1-3 was cotup*uatt Hcsnlts ttnlicatcdthat dillorcnces tn 
iictdonuc t)CrfOfnistKC ixitwcon bchavtOntHy d'SOidcrcd and icamtng disabled 
'ihtdciu^ wcic trivial In additton. stippli:incntaty aiuily^ci tndtcatcd that the two 
qtottp^ did not dillor with respect to lactor structure Ot ^tchirwrnent test pertor- 
nft^nce ifo* did tliey dillei with respect to reading nhUli coi*i:l;ttiOi}s IntplicatiOns 
wiih fcsiHici to cross- categorical education aie dint nsscd 

T lie issue of cross'Catcgoncal versus categorical piaccinent m spociai education has been 
jfie SI *)ject of much debate m past years (eg . Haii.ihan & Knuflman. i976. Heward & 
*Uflnns.iy. 1983. Hewett & Forness. 1974) T his issue genei ally involves presumed similarity 
^fot dissimilarity in behavioral, cognitive, or academic functiOiiing if students classified 
variously as learning disabled (LD) or behaviorally disordered (GO) are thought to be similar 
with respect to such variables. cross*categorical placement is generally recommended If. 
on the other hand, empirical evidence identifies substantive differences between the two 
qiOups differential placement could be considered more appropriate Unfortunately, avail* 
able data are not conclusive, for Similarities as well as dissimilarities have been reported 
hetwoLMi LD and OD population safnples 

Wiih respect to intellerlual functioning. BD and LD siudonis appear to be quite Similar 
Mastropieri Jenkins, and Scruggs (in press) review av<iilabie literature on inteMecfLial 
lunCtiOning Of BD Students and conclude that GD students typiCaliy are found to function in 
tho low to iiiicl-90s range, that is. shgtitly belOw average These findings are very similar to 
inose repotted for LD students Kavale and Forness (1984) evaluated 94 studies of LD 
siudenisand reported an Overall mean lO of 97 Likewise Cone Wilson Bradley, and Reese 
\ 1 985) evaluated 10 scores of a large sample of L D students and i eported an overa'I mean lO 
scoie 01 9r) Such data indicate that LD and BD stu(ienls at(* similar with respect to 
intolloclu.ll functioning With respect to cognitive functioning ii has t)L*en tiypothesizedthat 
LD childrcMi Suffer wiUi respect to one or niOre psychological processing" deficiencies, 
perhapsrcfiectingauditory or visual perception (e g . Joiinson & Myklebust. 19G7) To date 
however strong empirical support that LD students differ Iroin ottier populations in thts 
req.nd is lacking (Kavale & Forness. 1985). and the few attempts to assess processing' 
deficiencies of BD students (eg . Graubard. 19G5) are coniphcaied hy the validity and 
leliability limitations of the measures used 

With respect to sociaf/t;ohavioral functioning Hailahan and Kauffman ( t976) described 
fosearch wtiiCh suggests that lD and BD students exhihit similar types of t)ehavior. but 
diffoi soincwiiai with respect to the level of betiaviOi prODioms exhibited with BD students 
extiibiiinq behavior problems at a lugher level With r'^spect to perceptions of social 
lifnchonmg by ottiers. both populations tiave been viewed in more negative terms than 
average students, with Antonak ( t980) reporting high levels of '^eiection for BD students, 
and Bryan and Bryan (I98t) reporting similarly tugh levels of rejection for LD students A 
reC(*nt reanalysis of the LD 'socialization" literature, however lias suggested that these 
sUidenis may be merely more "at risk" for social rejection ttian their nondisabled peers 
(Dudiey-Marlmg & Edrmaston. 1985) 

Ttii\ roscMtC'i w.iS Sti|)(>orlc<J in p.ir1 l>y » grant fioin ihc U S ()co.iit«ii«*nt ol iiiUu .ilton nc'soa'cn m ihc EduoiliOn oi 
ihi; t <.indtC.ip(K»d Nu G0083O0U06 T)k> xmhuiS wotiid Hke lo irt.ink M.miyii limt.iknl Ufsiil.i Pifnenld AiuiMdiy 
EiiiMiMc«ner >o< trnut asSiSl«'incoiniheprcri.vAtionolin<Sin«vnisci«i>t M<*()>«'^is uu r*«i>««Missfu>uiOr>c;idc1ifsscdtoihc 

III SI 9n\fOt 
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In summaiy. It has been seen that BO and LO students appear to exhibit similar intellec- 
tual and cognitive characteristics, and appear to differ somewhat with respect to social 
behaviors and social status. An additional variable of importance, and one which forms the 
basis of the present investigation, is level of academic functioning Recently, Epstein and 
Cullinan (1963) argued convincingly that the level of academic functioning of bchaviorally 
disordered students was. in fact, significantly higher than that of corresponding learning 
disabled students. These researchers matched 16 pairs of learning dis<ibled and behavior- 
ally disordered students for chronological age. lO. sex. and ethnicity, and administered to 
all students several achievement measures They concluded that with chronological age 
and lO so matched. 60 students were significantly higher than LO siudents m <iil subtests 
with the exception of the General Information subtest on the Peabody Individual Achieve- 
ment Test and the Math subtest of the Wide Rnnge Achievement Test (80 students 
however, had scored significantly higher on the Mathcnmiics subtest of trie PIAT ) Tticsc 
significant differences amounted to over a t-year difference m gr<idc level scores, lending 
the authors to suggest that **suCh differences could present problems related to grouping 
andotherinstructionalconsiderations'(Epstein&Culiinnn. 1983 p 305) They concluded 
"these data give no support to the supposition th<it the trnditionni categories of nuld* 
moderate educational tiandicaps arc highly similar on the characteristics of ac.ideniic 
achieverrent" (p 305) 

/ The results of the Epstein and Cullman investigation provide vnlu.il)le information regard- 
ing relative achievement discrepancies of BO and LO students Some limitations of thit 
study, however, were noted by the authors These include, among otiicr things, the (acts 
that relatively small samples of students were employed and that no girls or minority pupils 
were included in the sample To these above stated limitations could be added another The 
conclusions of Epstein and Cullman refer to only a small sample of LO and 00 students 
n;atched on lO. and provide little information concerning academic achievement levels of 
large numbers of such students actually enrolled m public school special education classes 

The use of lO data m investigating the academic characteristics of beliaviorally disor- 
dered students has been employed frequently m the past (Fornoss Bennett. & Toso l983 
Graubard. 1964. 197t. Motto &Wilkms. 1968) Kauffman ( t98t) has h tod that use of 10 
data on behavioraily disordered students is critical for effectively asb<.'bsing the academic 
characteristics of this population Although matching on 10 witfi the ))oh«iviOially disor- 
dered and other populations docs provide information regarding relative disctepancies 
between ability and acadennc performance of the t)oh<iviorai(y disordered popui.iiion it 
does not describe the actual level of acadcnuc pcrforn)ancc CKhihitod Dy hchaviOr<i(ly 
disordered students enrolled m special education classes and how this t>ei lormanc<» Oif fcrs 
from that of their learning disabled counterparts 

The Epstein and Cullman (1983) study is most mlOrmativo regarding the relative abiliiy/ 
acadcniic performance discrepancy of tlicir sample of the two populations hut t)i()vidcs 
little information regarding the direct comparison of learning disabled and l)efi<iv orally 
disordered students on measures of academic functioning Tnc present inv<?S!ig«'ii()n was 
intended to investigate this issue by examining the achievciucnt test scores of a i.ugc 
sample of LO and BO children as ^hcy were enrolled m Sf)cciai cdticaiion classrooms 
Through this procedure, it was thought tliat evidence could be acquired regarding possit)le 
academic differences in performance between these two populations 

METHOD 

Oata were collected from 1.480 students m grades t -3 attending special education class- 
rooms m 58 elementary schools m a western metropolitan . 'ca (Average Total Batteiy 
percentile scores for the Stanford Achievement Test for all students m the district repres- 
ented by these schools were65. 60. andtu respectively for grades 1 2 and3 ) Of thospccial 
education population. 95% were Anglo, and 5% represented minority groups including 
Black. Hispanic, and Native American. 68 3% ( 1.012) were males, and 31 3*** (470) were 
females. 382 students were atter)ding first grade. 529 students were attending seco< .d grade 
and 571 students were attending third grade Average age of subjects was very similar for 
LO versus BO students, respectively for first graders mean age was 7 years, t month 
(SO = 5.1 months) and 7 years, t month (SO 5 4 months, for second graders, P years, t 
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month (SD « 6.2 months) and 8 years. 1 month (SD * 5.6 months): for ihir > graders. 9 years. 
2 months (SD « 6.2 months) and 9 years, 2 months (SD = 6.0 months). 

According to PL 94-142 and local critena. 619 (42%) were classified as LD and 863 (58%) 
were classified BD. These criteria mcludcd, for LD students, average or above intelligence 
(i.e., over 84 on a Wechsler Intelligence Scale for Children-Revised oraStanford-Bmci) and 
a 40% disscrepancy between ability and achievement m one or more of the following areas 
(a) oral expression, (b) listening comprehension, (c) written expression, (d) basic reading 
skills, (e) reading comprehension, matfiematics calculation, or (f) mathematics reasoning 
In addition, if the discrepancy is tfiougtit to be due to emotional disturbance, ai' LD 
classificaiton cannot be given 

Criteria for classification as bcM«'ivior«3lfy disordered included marked deliCrts in behav- 
ioral and/Or emotional functioning documented by teacher and psychologist wliich have 
proven resistant to simpler remediation and which have adversely affected ediicntionai 
performance In contrast to the LD classification criteria, no intellectual or academic CuiorM 
are specified for BD students. 1,347 students (91%) were attending resource room place- 
ments while 135 students (9%) were attending self-contained classrooms lO dala for ihis 
population were not availat)lc and m fact v ere not solicited for the purposes of tins siuUy 
Data were collected on \hc subiects fo. subtests of the i973 edition ol tlie Sianloid 
Achievenient Test (SAT Madden G.irdner. Rudman. Karlson & Merwin. l9/3) All k'si fl.it.i 
^crc collected frOfii the sOfiie adnnnistration during spring I9fl3 
f 

RESULTS 

Main Analyses 

Multivariate analysis of variance (MANOVA) tests were computed helwecn groups al each 
grade Icvci. with raw scores from tlie SAT subtests as dependent measures T lie MANOVA 
procedure was used to take into account Ihc high level of intofCoiroi.mons iv::ween 
subtests, and to control for an inllatcd experinient-wise alplia levol thought likely to icsuli 
frOm ■•peated Mosis on nomn(jc(KMidcni comparisons (Kerlingei & f'e0h*'uur. 1973 Levin 
in press Marascuil0& Levin. 1983 Wmor 1971) RawsCOrCS rather than (jr HOC egiJivaients 
or percentiles wt;rc comDuted because the ratio nature ol the nurnDois was more appiop 
nate for meeting theassumpttonsof analysis of variance (l-crguson lOfll ) *uul because raw 
scores provide a more precise measure ol tost behavior 

Analysis of the data revealed a significant multivanaic/^ approximation of 0 3^ /; 00 1 foi 
second gradeis a significant nuiliivanatc/- approximation of 2 20 p 033 foj thud (jiadeis 
and a nonsignificant niullivanaie f approximation of 8/. p ^8 \oi fiisl grad(.MS V»su.jl 
inspcclion of the descnplivc data presented m Tabic i mOiCaios that the achievo'n<MH 
scores consistently lavoi tho i.D fjroup over the OD group although tho efleci si/i's .im' 
small enough in all (,ases In constilutc .^tJo*"'(onablcpractiCa' '-ducatioiMi importance! Total 
Oaileiy ellect s>zes ol \A 18 and 08 for first second <ind third giadors resp»;<:tively) 
These dillrrences rarely exct.'ed 3 or ^ mo nths m grade equivalent scoics 

Tlic finding of a nonsignificant mullivana^o effect in the fust grade sample pM*cliiOC(J 
further analySiSwilhunivariatoiests|MarasCuilo& Levin. 1983) However imivan.itc/lr»sls 
wore computed on the second and third grade levels, lor wtiiCh SKjnifiCanI mull»vaiial<* 
effects fiad hcen found To control lo* ihc possibility of Type I (?rrois spocitu. pair wise 
comparisons were made at a level ol significance appropriate to a lamify wise alph.i level ol 
05 for each grade lev(?l 

It can Do argued that inultipleMcstson lomndcpcndeni data sets do not inflate ihe Typoi 
error probability as mucii in actual pracacc as expected by statistical theory and m fact 
some recent Monte Carlo studies have supportod this argument (l3crnt»ardson 19 '5 
Carmcr & Swanson. 1973. White i984) The decision made here was to use th(! more 
conservative procedure, especially considering the fact that the large sample size alloweii 
sufficient power to detect relatively small differences even when the pairwisc alpha level was 
quite small (Cohca i977) 

In the case of the 7 subtests on the second and third grade level, the resulting alpha was 
007 B/ these rather rigid criteria, significant differences favoring the LD group were 
nonetheless found at the second grade level for the Vocabulary. Listening Comprehension 
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TABLE 1 

Descriptm Data and Stathtitat Comparisons 





BO (N 


«253) 


LO (N 


■ 129| 










Grade 








Effect 




Percentile Equivalent 


Percentile 


Equivalent 


r 


size 


I 1191 \J'«UC 












-09 


1 UlCll IC<1UMI\^ 


20 


14 


23 


1.5 




Tr^tal mflth 
1 uioi iiiciii I 


18 


2.3 


24 


1.4 




- 15 


Vocabulary 


23 


1.0 


30 


1 4 


- 


- 16 


Listening 












- 16 


comprehension 


16 


K-6 


22 


K-9 


- 


Total 


12 


1.1 


18 


1.3 




• 14 




80 (N 


' 323) 


LO (N 


= 206) 






Second grade 














Total reading 


26 


20 


28 


20 


55 


-05 


^ \ oiai main 


26 


19 


34 


2 1 


2.16 


■ 20 


y Vocabulary 


16 


1 8 


26 


2.2 


295" 


-27 


Listening 














comprehension 


11 


13 


20 


1.9 


331'- 


-30 


Spelling 


12 


16 


16 


1 8 


1 99 


- n 


Social science 


14 


20 


28 


2.2 


3 43- 


-31 


Science 


14 


15 


24 


2 1 


3 10" 


■29 


Total 


1R 
1 o 


1 6 


26 


1.8 


1 99 


- IP 




BO (N 


= 287) 


LO (N 


= 284) 






Third grade 














Total reading 


23 


25 


24 


2 5 


14 


- 01 


Total math 


16 


29 


18 


30 


1 51 


- 13 


Vocabulary 


24 


25 


24 


29 


200 


- 1? 


Listening 














comprehension 


20 


25 


24 


28 


1 61 


- 14 


Spelling 


13 


25 


^ 12 


25 


-57 


05 


Social science 


22 


28 


22 


28 


50 


0-1 


Science 


12 


24 


16 


26 


-08 


01 


Total 


24 


27 


24 


2 7 


95 


-08 



'All I s. iitsiics were c< inputcd on raw scores 
StahSttCAiiy Stgniticani at ihc p*e-spec»itc<J proDahiitly levd p 007 
• ncCfiitsc ol 4 noosigrtltcjnt muinvanaie elfcct untvariatc si«iiiSi«cs were nol compuied 



Sojial Science, and Science si'btests Differences between groups m Total Math and 
5pe ling approached significance, but not at the level required by this analysis Differences 
in rioding were negligible, f < i in absolute value At the third grade level, no comparisons 
approached significance at the required level, and 4 of the 7 comparisons resulted in f s i 
in absolute value. The fact that a significant multivariate effect but no univariate effects were 
found IS nol uncommon and is doubtless a result of the fact that the MANOVA takes into 
account the high level of correlations between subtests, while the univariate tests do not 
(Winer. 1971) 

Supplementary Analysis 

Since statistical differences between BD and LD students were seen to be few. resulting in 
small effect sizes, supplementary analyses were computed to delemr ab whether trie 
patternsoi achievement lest performances could be seen to be different for the two groups 
To this end, separate factor analyses were computed for BD and LD students at each grade 



Behavioral Disorders May 1 986 



187 



level in order to determine whether the groups differed from each other with respect to 
underlying factor structure. Using Kaiser*s criterion for factor limitation (Marascuilo & 
Levin. 1983). each of the six separate factor analyses revealed only one factor, accounting 
for between 8 1 to 88% of total variance, and indicatino that over alt sutMests, only one factor 
was l^ing measured foreachgroup (perhaps, a 'general cognitive ability** factor), and that 
no difference in factor structure between BO and LO groups was discernible. In a follow-up 
analysis, individual correfations were computed between Total Reading and Total Math 
subtests for 80 and LO students at each grade fevel. Resulting correlations ranged from .78 
lo 88 (all p's • 01) (or J) groups. Comparisons made via Fisher's Z transformations 
(Ferguson. 1981) at each grade level indicated thai al no point were correlations lor 80 
students statistically different from correlations for LD students (alf p's ^ .20) 

OfSCUSSlON 

n(*sultsof the present investigation suggest that, for tins sample of primary grade students. 
GO students do not exhibit academic performance superior to their LO age peers when 
ncadeinic achievement scores of students attending special education placements are 
exaiiiffiod These findings contrast witli those of Epstein and Cullman (1983) who sug- 
g^^slod ihal academic performance of 80 students may be higher than that of LO i»ge peers 
The le.ison for these discrepant findings may be thai the Epstein and Cullman sub|c:c*!: -'c 
.hatched l>y lO. while the subjects in the preseni investigation represented the total number 
a sample of students enrolled m LO and 80 classes without respect to intellectual 
functioning It must also be emphasized that Epstein ana Cullman studied intermediate 
aged students from self-contained settings white the present investigation evaluated the 
academic performance of primary aged students, most of whom were attending resource 
rooms 

While the findings of Epstein and Cullman aie of theoretical importance m that they 
underline differences in performance discrepancies tjetween tiie two populations m the 
sample selected lliey do not provide direct evidence concommg how a large sample of 
these students actually funct ions sn classes compaied with their learning disat)ied counter- 
parts The results of the presem investigation suggest trial at least at the primary grade 
levels m irie population sampled. LO and GO children are in fjct very similar with respect lo 
.ic<ideinic performance Even though statistically significant differences were found on 
some comparisons il must be rememiDered that the large sample size resulted m sufficient 
statistical power lo discern relatively small effect sizes (Cohen. 197") In fact, for Total 
Reading Total Math, or Total Battery scores tliese differences did rot exceed two monllis 
in qiade equivalent scores 

A Ciise may be made that although academic lunctionmg o»viea»s smiilai given a static 
achievement test measure, the population may (jiffor witii i<»s, to talc of teaming II this 
wet e hue however one wouldexf)ect the 80 studentb to i)cgm i surpass tlie LO students 
.ic<i(jeinically by the second or third grade Such differences ovei grade levels howevei 
weic (loi '»t>served 

^■though the sample size used m this investigation was rei.itiveiy inrge it sliould l)e 
recalled I hat the subjects came from only one geographical area Tins fact may present 
piot)lems in generalization of findings However it must also De maintained that the 
standards for inclusion m special educ.ition placement in this aiea are very similar to criteria 
used around trie country In fact these cniena make the Imdings more surprising m that 
speciJ'C atjility/performance discrepancies in areas of academic functioning are necessary 
requirements for LO placement while they are not for GO ()iacrMnent Nevertheless, the 
strong similanties between the two groups indicate that for on(; reason or another many 
LO and 30 students in the primary grades apparently do function on a highly similar 
academic level This finding does not support the suggestion ol Cullina;i Lloyd and 
Epstein ( 196 1) that academic deficits may be mmmial m the primary grades and increase 
with age It was found, however that the variability of 00 stuJent performance descriptively 
exceeded that of LO students at all grade levels Such higher levels of variability on the pari 
of 80 studonts have been reported by Fornesset al (1983) Although the relatively higher 
descriptive level of variability here may simply be an artifact of the fact that an academic 
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cutoff level was operating for LO but not BO students, it does suggest that a special 
education teacher may expect to find a wider range of academic achievement among BD 
students 

Inconirdsi to the Epstein and Cullman (1983) investigation, no evidence is given by these 
data thai academic programing should proceed differeniially for the two groups However, 
the fact that two groups are functioning at a similar academic level does not necessarily 
mean thai iiisiruciionnl procedures should be the same. For example, the "Hewett Model" 
(Hcwett. 1968). which relics strongly upon independent scat work, has been a popular 
model for BD clnssrooms. while IMo Direct Instruction model, which relies upon teacher- 
Icnrncr inlerncnons. has t)tx*n louiid to be elfective with LD students (Lloyd. Cullman. 
Hems. & Epstein 1980) Whethersuchdifferentproceduresarecf///eren/fa//yeffect(velorLD 
versus BD students howsovei. has not l>ccn determined At picscnt it must be concluded 
ttint little IS known about optminl instructional strategics for LD versus BD children, and it is 
the opinion of Hu? piesont authors ifiat research is greatly needed in this area 

The leasons tiH?s<* two supposodly different groups function similarly in academic 
prrfornnnccMsuncoilnin nnd cnnnoi he given On the basis Of the data presented here It has 
ofton boon st.il<?d m pi nclico by th jse who work with LD nnd BD children that the causal link 
!)elw<MMi boh.ivior pf obliMiis ami learning disabilities is a strong one whose directionality is 
often 111 question W m.iy b;* thai the causal relation between learning and beh.ivioral 
^isabilttios IS o1 suflicient sticnqiM that academic shortcomings are a frequeni conse- 
yquonce leg.iidifjss ol !h(} n.iture of special education classification 

In spite of the npp.nent disc'iopancies between the present investigation and tlie Epstein 
and Cullman (1983) study we would like to end on a note ol concordance with those 
res<?aichers In oui view Epstein and Cullman are quite correct m their assertion that 
effectiveness of service is a much higher priority tlian the categorical veisus cross- 
categoiical natmr ol that service an assertion for winch enipirical suppOM is available 
(»Heller HoMzmai! Mes.^ick 1932) Although the present data srggest that cross- 
catogorical pl.u:<MiuMit m.iy ho advisable, the present authors would rather see effective 
cducahonalpio()Mininq tticati'goncal settings than ineffective teaching mc'oss-categorical 
settings It IS thought however th.it tlie search for Optimal educational settings can p«ii'«''llel 
the search for oplint.U<Mliir.ii«onatstiatt?9ies within such sotttnys and it is to ihos<?(Mids tliat 
th<* pn:sent reseatcfi was .i(Jdn*sr»<Ml 
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PREDICTING OUTCOME THROUGH ACHIEVEMENT: ACADEMIC GAIN 
DURING PSYCHIATRIC HOSPITALIZATION AS A MEASURE OF SCHOOL 
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PREDICTING OUTCOME THROUGH ACHIEVEMENT: 
ACADEMIC GAINS DURING PSYCHIATRIC HOSPITALIZATION 
AS A MEASURE OF SCHOOL PROGRESS AT FOLLOW-UP 



Abstract 

Although academic achievement o-F behavior disordered pupils 
is less -frequently studied than their social adjustment and 
is consequently lees well established, knowledge o-F their academic 
progress is important to the development o-f sound educational 
planning. In this study, academic gains o-f 150 children and 
adolescents hospitalized -for psychiatric disorders were used 
to predict subsequent school outcome as measured by teacher 
ratings in their post-discharge classrooms. E-f-fects o-f IQ and 
severity o-f diagnostic disorder were also examined, as was the 
type o-f classroom placement. Significance o-f these gains in 
academic achievement were discussed in relation to educational 
planning -for behavior disordered pupils. 



PREDICTING OUTCOME THROUGH ACHIEVEMENT: 
ACADEMIC GAINS DURING PSYCHIATRIC HOSPITALIZATION 
AS A MEASURE OF SCHOOL PROGRESS AT FOLLOW-UP 

Although professional concern usually focuses on social 
adjustment of behavior disordered children in classroom settings, 
there is growing interest in academic achievement of these children 
as well. Part of this interest is reflected in the teacher's 
need to choose appropriate academic materials in order to avoid 
adverse responses to frustration or boredom on the part of the 
behavior disordered pupil (Forness, 1983). Establishing correct 
achievement levels is viewed, in this context, as an important 
antecedent or respondent conditioning approach to managing the 
behavior of such children (Hall, Delquadri & Harris, 1977). 

The nature and extent of academic deficits of behavior 
disordered children, however, remains relatively unclear. The 
incidence of significant under achievement of children in public 
school or clinic programs for the behavior disordered has been 
reported to vary widely (Glavin & Annesley, 1971; Wright, 1974; 
Scruggs & Mastropieri, 1985) as it has in residential programs 
(Barnes S< Forness, 1982; Forness, Bennett & Tose, 1983; Stone 
& Rowley, 1964). Changes in academic achievement as a result 
of intervention with behavior disordered children are likewise 
difficult to determine, both in terms of manipulation of testing 
situations (Finer & Forness, 1984; Forness & Dvorak, 1982; Scruggs 
it Mastropieri, in press a) and in response to classroom placement 
or treatment outcome (Abidon & Selzer, 1981; Ashcraft, 1971; 
Calhoun Se Elliott, 1977; Scruggs & Mastropieri, in press b; 
Vacc, 1972). 
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Predicting Outcome Through Achievement ... 2 
An important but relatively unexamined question in this 
area, however, concerns the significance of gains in academic 
achievement in this population. Does, for example, the academic 
progress of a behavior disordered student in a particular classroom 
setting relate to anticipated classroom needs or subsequent 
school progress? The answer to this question is important for 
a number of reasons, not the least of which is the need for 
establishing appropriate academic goals and related data for 
each behavior disordered pupil's individual educational plan 
(Forness, 1979; McGinnis, Kiraly & Smith, 1984). A preliminary 
study of school outcome as predicted through academic gains 
during treatment (Forness, Cronin & Lewis, 1981) demonstrated 
that gains in math but not in reading achievement were positively 
related to subsequent teacher ratings of adjustment. This study 
bears replication, however, in that complete follow-up data 
were available on only 25 adolescents, and no younger children 
were included in the sample. 

The present study addresses the issue of predicting subsequent 
school adjustment of children and adolescents, using academic 
gains during traacment as a predictive variable. It further 
examines this question in a group of children who, although 
hospitalized for psychiatric treatment or evaluation, returned 
to a variety of mainstreamed and special classroom settings 
in the community. The sample was large enough to examine effects 
of age, intelligence, severity of diagnosis, length of treatment, 
and type of subsequent classroom placement on outcome, as rated 
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by classroom teachers at follow-up. 

Method 

Subjects for the study were selected from a total population 
of 342 subjects aged 8 to 17 years. These subjects were admitted 
to two inpatient wards in the UCLA Neuropsychiatric Institute 
(NPI) over a 4-year period, from January 1981 to January 1985. 
All were hospitalized for serious behavior disorders, and a 
complete description of the hospital treatment program and school 
approaches is provided in Forness (1983). 

It should be mentioned briefly that psychiatric treatment 
on these wards was individualized for each subject and included 
a combination of short-term psychodynamic, family therapy, and 
behavioristic treatment approaches. Trials of psychotropic 
medication were used when indicated. Er^-h subject was given 
from two to three therapy sessions each week by psychiatry residents 
in training, including a family therapy session along with staff 
social workers. Nursing staff used behavioral approaches for 
management of social behavior, and each child or adolescent 
attended four to six sessions of occupational and recreational 
therapy each week. The hospital school program was based on 
individualized instruction in a group setting with behavioristic 
approaches for motivation and management of clasroom behavior. 
Subjects attended 3 hours of school daily. Their length of 
hospitalization was 2 to 3 months on the average. 



Predicting Outcome Through Achievement ... 4 
Achievement testing of each child or adolescent was done 
during the first week of hospital admission and again during 
the last week before discharge. All tests wrre administered 
by certified classroom teachers. The achievement test used 
was the Compehensive Test of Basic Skills ( JTB/McGraw-Hi 1 1 , 
1973); and two subtests, Reading Vocabulary and Mathematics 
Computation, were used in the data analysis. Alternate forms 
of the test were used in pre- and post-testing. 

Intelligence testing was completed by licensed clinical 
psychologists at NPI or by psychometrists working under their 
direction. IQ tests used were either the WISC-R or the Stanford 
Binet. Full scale 10 was used in data analysis. Psychiatric 
diagnoses were obtained from each subject's medical chart and 
rated according to a five-point scale of severity, derived from 
a study of diagnostic-related groupings in a similar population 
(Forness, Sinclair, Alexson, Seraydarian & Garza, 1985). In 
this scale, attention deficit disorders are rated as 1, conduct 
or adjustment disorders as 2, affective or somatic disorders 
as 3, borderline or other personality disorders as 4, and schizo- 
phrenic or psychotic disorders as 5. 

Although 342 subjects had been admitted over the study 
period, complete sets of scor».<& were unavailable on a number 
of subjects who were dischargea before post-testing could be 
completed or who did not receive IQ testing; but comparison 
of the scores of these subjects with the remaining subjects 
in the population did not reveal any systematic bias in sex. 
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age, length o-f stay, or pretest achievement levels. Achievement 
gain scores for all subjects were computed by subtracting each 
adolescent's achievement standard scores at admission from those 
obtained at discharge. 

In order to obtain a follow-up measure of each child or 
adolescent's classroom performance in the public school after 
discharge from the hospital, forms were mailed to his or her 
receiving classroom teacher in the public school after the subject 
had been discharged for at least 2 months but for less than 
4 mo.iths. These forms were approved by the UCLA Human Subject 
Protection Committee, and informed consent letters were signed 
by parents or guardians. The forms included a section in which 
the teacher could indicate the current type of classroom the 
child K*as attending, as followss (1) full-time unassisted placement 
in regular class, (2) regular class placement with resource 
or designated services, (3) primary placement in a special day 
class for learning handicapped students, (4) primary placement 
in special day class for seriously emotioniil ly disturbed students, 
or (5) primary placement in a residential setting. These categories 
were treated as a scale of classroom placement severity, with 
1 being the least restrictive and 5 being the most restrictive. 
The forms also contained rating scales upon which the teachers 
could make two overall ratings of the children or adolescents' 
academic and social adjustment in their classroom at that point. 
The t^aachers were asked to rate subjects on a S^point scale 
on both academic adjustment and socialization relative to other 
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pupils in the same "-lasroom. The 5 points on each scale were 
(1) much worse than, (2) slightly worse than, (3) about the 
same as, (4) slightly better than, and (5) much better than 
the average child or adolescent in the placement classroom. 
Stamped sel -F -addressed envelopes were included -for '**=^turning 
these rating -forms to the hospital. (Copies o-f thp -forms and 
consent letters are available upon request.) 

Results 

Complete data and consent letters were obtained -for 212 
o-f the 342 subjects admitted over the 4-year period. Q-f the 
213 -forms mailed, 150 were returned, a rate o-f 717.. The mean 
age o-f the sample was 000 years, with a range o-f 8 to 17 years 
and a standard deviation o-f 000 years. O-f the 150 subjects, 
OOX were males and 007. were -females. The mean length o-f stay 
was 000 months, with a rar ^e o-f 000 to 000 months. Mean IQ 
was 000 (range 00 to 000, SD = 000). Comparison o-f the means 
and standard deviations o-f this sample with those of the total 
population o-f 342 subjects did not reveal any signi-ficant di-f -f erences 
in age, length o-f stay, or pretest achievement levels. 

Table 1 provides the mean, range, and standard deviation 
o-f the pre- and post-achievement subtests in both standard scores 
and grade-level scores, although standard scores were used in 
subsequent data anlayses. Note that, on the average, these 
subjects made substantial gains in reading and mathematics. 
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INSERT TABLE 1 ABOUT HERE 



In regard to severity of psychiatric diagnoses, there were 
00 subjects in category 1 (attention deficits); 00 in category 
2 (adjustment or conduct disorders); 00 in category 3 (c^ffective 
or somatic disorders); 00 in category 4 (borderline or other 
personality disorders); and 00 in category 5 (schizophrenic 
or psychotic disorders). There were 00 subjects unassisted 
in regular classrooms, 00 in regular classes with resource or 
designated services, 00 in special classes for learning handicapped, 
00 in special classes for the seriously emotionally disturbed, 
and 00 in residential schools. The mean achievement raLing 
for the sample, as obtained from the follow-up questionnaires^ 
was 000, with a range of 1 to 5 and a standard deviation of 
000. This indicates that the subjects were performing at about 
the same academic levels as their peers in the post-disicharge 
classrooms. The mean social ization rating was 000 (range 1 
to 5 and SD 000), indicating that the adolescents were performing 
slightly above tl^eir pee s in socialization. Table 2 providesi 
correlations among the various study variables. 



INSERT TABLE 2 ABOUT HERE 



In order to examine the predictive validity of these gains, 
^ multiple regression analyses for gain scores in reading and 
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Predicting Outcome Through Achievement 8 
math as predictive o-f the two outcome ratings were done in order 
to control for variables such as age^ IQ, beginning achievement 
levelj, severity o-f behavior disorder, and length o-f stay. Prior 
to this, an analysis o-f variance was done to determine i-f the 
two teacher outcome ratings di-f-fered as a -function o-f type o-f 
classroom placement at -follow-up. 
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TABLE 1 



Mean, Range and Standard Dt»viation of Pre- and Post- 
Achievement Scores and Gains in Each Area 



Admi ssi on Di scharge Gai n 

Mean (Range) SD Mean (Range) SD Mean (Range) SD 

Reading Scores: 

Grade-Level 000 (OOOtoOOO) 000 000 (OOOtoOOO) 000 000 (OOOtoOOO) 000 
Standard 
Mathematics Scores: 
Grade-Level 
Standard 



TABLE 2 



Correlations Among Study Variables 



Variables: 1. 2. 3. 4. 5. 6. 7. 8. 9. 

1. Reading Pains 0000 0000 0000 0000 0000 0000 0000 0000 0000 

2. Math Gains 

3. IQ 

4. Age 

5. Length of Stay 

6. Diagnostic Severity 

7. Classroom Severity 

8. Academic Outcome 

9. Social Outcome 
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